Playground

The Phoenix Playground provides an interactive environment for rapidly iterating on prompts, testing different models, and tuning generation parameters—all without writing code. It’s designed for prompt engineering, quick experimentation, and debugging LLM behavior.

What is the Playground?

The Playground is a web-based interface for:

Prompt Engineering: Craft and refine prompts with live feedback
Model Comparison: Test the same prompt across multiple models side-by-side
Parameter Tuning: Adjust temperature, top-p, max tokens, and other settings
Trace Replay: Load production traces and rerun them with different configurations
Iteration Speed: Get immediate feedback without writing or deploying code

Accessing the Playground

The Playground is available in the Phoenix UI:

Launch Phoenix

Start Phoenix and navigate to the UI:

phoenix serve
# Open http://localhost:6006

Open Playground

Click “Playground” in the navigation menu or navigate to a specific project and click “Open in Playground”.

Configure your LLM

Select your model provider (OpenAI, Anthropic, etc.) and enter API credentials if needed.

Key Features

Prompt Editor

The Playground provides a rich editor for crafting prompts with:

System/User/Assistant Messages: Structure conversational prompts
Template Variables: Use {{variable}} syntax for dynamic content
Multi-turn Conversations: Build complex conversation flows
Syntax Highlighting: Clear visual formatting

Example Prompt

System: You are a helpful customer support agent for {{company_name}}.
You should be professional but friendly.

User: {{customer_query}}

Assistant:

Model Selection

Choose from supported LLM providers:

OpenAI

GPT-4, GPT-4 Turbo, GPT-3.5 Turbo

Anthropic

Claude 3 Opus, Sonnet, Haiku

Azure OpenAI

Azure-hosted OpenAI models

Custom Providers

Configure custom API endpoints

Parameter Tuning

Adjust generation parameters interactively: Temperature (0.0 - 2.0)

Controls randomness in outputs
Lower = more deterministic
Higher = more creative/varied

Top P (0.0 - 1.0)

Nucleus sampling threshold
Lower = more focused on likely tokens
Higher = broader token selection

Max Tokens

Maximum length of generated response
Prevents runaway generation

Frequency/Presence Penalty (OpenAI)

Reduce repetition in outputs
Frequency: penalize based on token frequency
Presence: penalize based on token presence

Stop Sequences

Define custom stopping points
Useful for structured outputs

Side-by-Side Comparison

Compare multiple model/parameter combinations simultaneously:

Add comparison column

Click “Add Comparison” to create a new configuration column.

Configure each variant

Set different models or parameters for each column:

Column 1: GPT-4 with temp 0.7
Column 2: Claude 3 Sonnet with temp 0.7
Column 3: GPT-4 with temp 0.2

Run all variants

Click “Run All” to execute the same prompt across all configurations.

Compare outputs

Review outputs side-by-side to identify quality, cost, and latency differences.

Trace Replay

One of the most powerful features is replaying production traces in the Playground:

Find a trace

Navigate to a trace in your project that you want to replay or debug.

Open in Playground

Click “Replay in Playground” from the trace detail view.

Modify configuration

The Playground loads with the exact prompt and inputs from the trace. Now you can:

Edit the prompt
Change the model
Adjust parameters
Modify input variables

Rerun and compare

Execute the modified configuration and compare against the original trace output.

Use Cases for Trace Replay:

Debug problematic production outputs
Test prompt improvements on real user queries
Evaluate model upgrades (e.g., GPT-3.5 → GPT-4)
Investigate why certain inputs failed

Playground Configuration

API Keys

Configure API keys for model providers:

OpenAI
Anthropic
Azure OpenAI

# Set via environment variable
export OPENAI_API_KEY="sk-..."

Or enter directly in the Playground UI settings.

export ANTHROPIC_API_KEY="sk-ant-..."

export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://..."
export AZURE_OPENAI_API_VERSION="2024-02-15-preview"

Custom Providers

Add custom LLM providers through the Phoenix configuration:

# In Phoenix configuration (helper code in src/phoenix/server/api/helpers/playground_clients.py)
# Custom providers can be registered for use in the Playground

Save Prompt Configurations

Prompt configurations from the Playground can be saved for reuse:

Name your configuration

Give your prompt + parameters a descriptive name.

Save as template

Click “Save Template” to store in Phoenix.

Load later

Access saved templates from the Playground sidebar.

Export to Code

Convert Playground configurations to production code:

# Example: Export Playground config to OpenAI Python code
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4",
    temperature=0.7,
    max_tokens=500,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Phoenix?"}
    ]
)

The Playground provides export options for common frameworks:

OpenAI Python SDK
Anthropic Python SDK
LangChain
LlamaIndex

Integration with Prompt Management

The Playground integrates with Phoenix’s Prompt Management system:

Save to Prompt Registry

Prompts created in the Playground can be saved as versioned prompts:

Finalize prompt

Test and refine your prompt in the Playground.

Save as versioned prompt

Click “Save to Prompt Registry” and provide:

Prompt name
Version tag (e.g., “v1.0”, “production”)
Description

Use in production

Load the saved prompt in your application code:

import phoenix as px

client = px.Client()
prompt = client.get_prompt(
    name="customer_support_greeting",
    tag="production"
)

Load from Prompt Registry

Bring existing versioned prompts into the Playground for testing:

Click “Load Prompt” in the Playground
Select from your saved prompts
Choose a specific version or tag
Test with different models or parameters

Playground for Experiments

Use the Playground to rapidly prototype before running formal experiments:

Prototype in Playground

Test your task logic interactively with various inputs.

Validate across examples

Manually try multiple dataset examples to verify behavior.

Export to code

Convert your Playground configuration to a task function.

Run full experiment

Execute the task systematically across your dataset:

from phoenix.experiments import run_experiment

def task(input):
    # Logic refined in Playground
    return process(input)

result = run_experiment(
    dataset=dataset,
    task=task,
    experiment_name="playground-prototype"
)

Best Practices

Iterate Quickly: Use the Playground for fast iteration before committing to code or experiments.

Test Edge Cases: Try unusual inputs, very long queries, and adversarial examples. Compare Models: Don’t assume one model is always better—test on your specific use case. Document Findings: Save configurations that work well and note parameter settings. Use Trace Replay: When debugging production issues, always replay the trace in the Playground first. Version Prompts: Once you find a good prompt, save it to the Prompt Registry with proper versioning.

Keyboard Shortcuts

Speed up your workflow with keyboard shortcuts:

Cmd/Ctrl + Enter: Run current configuration
Cmd/Ctrl + S: Save configuration
Cmd/Ctrl + K: Clear output
Tab: Navigate between fields

Next Steps

Prompt Management

Version and manage prompts systematically

Experiments

Run systematic experiments on datasets

Tracing

Understand trace replay capabilities

Evaluation

Evaluate Playground outputs systematically

Get Started

Core Features

Tracing

Evaluation

Datasets & Experiments

Integrations

What is the Playground?

Accessing the Playground

Key Features

Prompt Editor

Example Prompt

Model Selection

OpenAI

Anthropic

Azure OpenAI

Custom Providers

Parameter Tuning

Side-by-Side Comparison

Trace Replay

Playground Configuration

API Keys

Custom Providers

Save Prompt Configurations

Export to Code

Integration with Prompt Management

Save to Prompt Registry

Load from Prompt Registry

Playground for Experiments

Best Practices

Keyboard Shortcuts

Next Steps

Prompt Management

Experiments

Tracing

Evaluation

Build docs developers (and LLMs) love

Get Started

Core Features

Tracing

Evaluation

Datasets & Experiments

Integrations

Documentation Index

​What is the Playground?

​Accessing the Playground

​Key Features

​Prompt Editor

​Example Prompt

​Model Selection

OpenAI

Anthropic

Azure OpenAI

Custom Providers

​Parameter Tuning

​Side-by-Side Comparison

​Trace Replay

​Playground Configuration

​API Keys

​Custom Providers

​Saving and Sharing

​Save Prompt Configurations

​Export to Code

​Integration with Prompt Management

​Save to Prompt Registry

​Load from Prompt Registry

​Playground for Experiments

​Best Practices

​Keyboard Shortcuts

​Next Steps

Prompt Management

Experiments

Tracing

Evaluation

Build docs developers (and LLMs) love

What is the Playground?

Accessing the Playground

Key Features

Prompt Editor

Example Prompt

Model Selection

Parameter Tuning

Side-by-Side Comparison

Trace Replay

Playground Configuration

API Keys

Custom Providers

Saving and Sharing

Save Prompt Configurations

Export to Code

Integration with Prompt Management

Save to Prompt Registry

Load from Prompt Registry

Playground for Experiments

Best Practices

Keyboard Shortcuts

Next Steps