Effective experiment tracking is essential for RL training, where reward curves can be noisy, training can diverge unexpectedly, and runs often last many hours. verl integrates with multiple tracking backends out of the box — from cloud-hosted services like Weights & Biases and MLflow to local options like TensorBoard. This page explains how to configure them and which metrics to watch.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/verl-project/verl/llms.txt
Use this file to discover all available pages before exploring further.
Supported Loggers
verl supports the following logging backends, which can be enabled simultaneously:| Backend | Value in Config | Notes |
|---|---|---|
| Console | "console" | Always-available stdout logging |
| Weights & Biases | "wandb" | Requires WANDB_API_KEY environment variable |
| SwanLab | "swanlab" | Alternative experiment tracker |
| MLflow | "mlflow" | Requires a tracking server or local directory |
| TensorBoard | "tensorboard" | Logs to tensorboard_log/{project_name}/{experiment_name} (override with TENSORBOARD_DIR env var) |
| Trackio | "trackio" | Lightweight alternative tracker |
Configuration
All logging is configured under thetrainer section:
List of active logging backends. Provide multiple values to log to several backends simultaneously. Example:
["wandb", "tensorboard", "console"].Project name used as the top-level grouping in wandb, SwanLab, and MLflow.
Run name used to identify this specific experiment within the project. Also used as a component of the checkpoint directory path.
Number of validation generations (prompt + response pairs) to log at each validation step. Logging generations lets you qualitatively inspect model behavior alongside quantitative metrics. Set to
0 to disable for maximum throughput.Weights & Biases
Set theWANDB_API_KEY environment variable to authenticate. All metrics and logged generations are uploaded automatically:
MLflow
Point verl at a tracking server or a local directory via the standardMLFLOW_TRACKING_URI environment variable:
logger: ["mlflow"] (or include it alongside other loggers).
TensorBoard
TensorBoard events are written totensorboard_log/{project_name}/{experiment_name} by default (override with the TENSORBOARD_DIR environment variable). Launch the viewer pointing at that directory:
TensorBoard logging adds minimal overhead and works entirely offline, making it a good choice alongside wandb for redundancy or when running on air-gapped clusters.
Key Metrics to Monitor
verl logs a rich set of metrics at each training step. Below are the most important ones to watch.Reward Metrics
| Metric | Description |
|---|---|
reward/mean | Average reward per step — the primary training metric |
reward/std | Reward variance across the batch |
reward/max | Maximum reward in the batch |
reward/min | Minimum reward in the batch |
reward/mean curve should trend upward over training. Sudden plateaus or drops often indicate reward function issues, KL coefficient problems, or optimizer instability.
Response Length Metrics
| Metric | Description |
|---|---|
response_length/mean | Average number of tokens in generated responses |
response_length/std | Variance in response length |
response_length/max | Maximum response length observed |
Policy Metrics
| Metric | Description |
|---|---|
actor/loss | Actor policy loss (PPO objective) |
actor/pg_loss | Policy gradient component of actor loss |
actor/kl | KL divergence from the reference policy |
actor/grad_norm | Actor gradient norm — watch for spikes indicating instability |
actor/entropy | Policy entropy — should not collapse to near zero |
kl_coef | Current KL coefficient (changes when using adaptive KL controller) |
Critic Metrics (PPO Only)
| Metric | Description |
|---|---|
critic/loss | Critic value function loss (MSE against returns) |
critic/values/mean | Mean predicted value |
critic/returns/mean | Mean actual returns used for critic training |
Throughput Metrics
| Metric | Description |
|---|---|
rollout/throughput | Tokens per second during rollout generation |
train/throughput | Tokens per second during actor/critic update |
timing/rollout | Wall-clock time for the rollout stage (seconds) |
timing/update_actor | Wall-clock time for actor parameter update (seconds) |
Precision Diagnostics
Whenactor_rollout_ref.rollout.calculate_log_probs=True, verl logs:
| Metric | Description |
|---|---|
training/rollout_probs_diff_mean | Mean absolute difference between log probs from the rollout engine and the training engine. Values below 0.005 are normal; above 0.01 suggests a precision mismatch between inference and training. |
rollout_probs_diff_mean can cause actor/grad_norm to grow continuously. See the FAQ for remediation steps.
Grafana and Prometheus Cluster Monitoring
For cluster-level hardware monitoring alongside training metrics, verl supports Prometheus metric exposition from the rollout engine, with Grafana dashboards for visualization.Rollout Engine Prometheus Metrics
The vLLM/SGLang rollout server can expose Prometheus metrics over HTTP. Enable this in the rollout config:Expose Prometheus metrics from the rollout engine over HTTP. Useful for monitoring KV cache utilization, request queue depth, and inference throughput at the server level.
HTTP port for the Prometheus
/metrics endpoint.Short model name displayed in Grafana instead of the full model path. Useful when model paths are very long.
Ray Timeline Profiling
To generate a Ray timeline for performance analysis of a training job, setray_kwargs.timeline_json_file:
chrome://tracing for a flame graph of all Ray tasks across the cluster.
Setup Reference
Full Grafana and Prometheus setup instructions (including dashboard templates and scrape configurations) are documented indocs/advance/grafana_prometheus.md in the verl repository.
Logging Overhead and Performance Tips
Logging validation generations (
log_val_generations) involves serializing prompt and response text and uploading it to the tracking backend. If you have large validation batches or are running on a slow network, reduce this value or set it to 0 during initial hyperparameter sweeps and re-enable it for final runs.- Use
"console"only during initial debugging; switch to"wandb"or"tensorboard"for full runs. - The
rollout/throughputandtrain/throughputmetrics are only accurate whenactor_rollout_ref.rollout.disable_log_stats=False— setdisable_log_stats=Falseto enable these metrics while tuning performance. - For Ray timeline analysis, set
timeline_json_fileonly for profiling runs, as it adds file I/O at job completion.