Skip to main content

Overview

Slime uses a comprehensive argument system built on top of Megatron-LM arguments with extensive RL-specific additions.

Parsing Arguments

parse_args()

Main entry point for parsing all arguments.
from slime.utils.arguments import parse_args

args = parse_args()
Returns: Namespace object containing all parsed arguments Source: slime/utils/arguments.py:1, defined via get_slime_extra_args_provider()

Argument Categories

Cluster Configuration

Arguments for distributed training setup.
--actor-num-nodes
int
default:"1"
Number of nodes for actor training
--actor-num-gpus-per-node
int
default:"8"
GPUs per node for actor training
--critic-num-nodes
int
Number of nodes for critic training (if using critic)
--critic-num-gpus-per-node
int
GPUs per node for critic training
--rollout-num-gpus
int
Total GPUs for rollout inference (ignored if --colocate)
--rollout-num-gpus-per-engine
int
default:"1"
GPUs per inference engine (tensor parallel size)
--num-gpus-per-node
int
default:"8"
GPUs per node for resource allocation
--colocate
bool
default:"False"
Colocate inference engines and actor on same GPUs (enables offloading)
--offload
bool
default:"False"
Enable both --offload-train and --offload-rollout
--offload-train
bool
Offload training model to CPU during rollout
--offload-rollout
bool
Offload rollout model to CPU during training
Source: slime/utils/arguments.py:37-108

Training Configuration

--qkv-format
str
default:"thd"
QKV layout for Megatron: “thd” or “bshd”
--true-on-policy-mode
bool
default:"False"
Enable strict on-policy training
--train-env-vars
dict
default:"{}"
Extra environment variables for training (JSON string)
--train-memory-margin-bytes
int
default:"1073741824"
Memory margin for training allocation (default: 1GB)
--enable-weights-backuper
bool
default:"True"
Enable weights backuper (disable to save host memory)
--megatron-to-hf-mode
str
default:"raw"
Megatron to HuggingFace conversion: “raw” or “bridge”
--custom-model-provider-path
str
Path to custom model provider function
--recompute-loss-function
bool
default:"False"
Recompute loss function to save memory
--log-probs-chunk-size
int
default:"-1"
Chunk size for log prob computation
--only-train-params-name-list
list[str]
Regex patterns of parameters to train (freeze all others)
--freeze-params-name-list
list[str]
Regex patterns of parameters to freeze
Source: slime/utils/arguments.py:111-214

Rollout Configuration

--hf-checkpoint
str
HuggingFace checkpoint path for rollout model
--model-name
str
Model name for weight conversion (auto-detected if not set)
--rollout-function-path
str
default:"slime.rollout.sglang_rollout.generate_rollout"
Path to rollout generation function
--rollout-temperature
float
default:"1.0"
Sampling temperature
--rollout-top-p
float
default:"1.0"
Top-p (nucleus) sampling
--rollout-top-k
int
default:"-1"
Top-k sampling (-1 for disabled)
--rollout-max-context-len
int
Maximum context length (must not exceed model’s max position embeddings)
--rollout-max-prompt-len
int
Maximum prompt length (filters long prompts at init)
--rollout-max-response-len
int
Maximum response length (max_tokens in SGLang)
--rollout-skip-special-tokens
bool
default:"False"
Skip special tokens in decoded response
--rollout-stop
list[str]
Stop strings for generation
--rollout-stop-token-ids
list[int]
Stop token IDs for generation
--rollout-shuffle
bool
default:"False"
Shuffle prompts during rollout
--rollout-seed
int
default:"42"
Random seed for rollout
Source: slime/utils/arguments.py:218-339

Sampling Configuration

--over-sampling-batch-size
int
Sampling batch granularity (defaults to rollout_batch_size)
--dynamic-sampling-filter-path
str
Path to dynamic sampling filter function
--partial-rollout
bool
default:"False"
Enable partial rollout with sample recycling
--mask-offpolicy-in-partial-rollout
bool
default:"False"
Mask off-policy tokens in partial rollout
--custom-generate-function-path
str
Path to custom generate function
--custom-rollout-log-function-path
str
Path to custom rollout logging function
--buffer-filter-path
str
Path to buffer filter function
Source: slime/utils/arguments.py:341-426

Data Configuration

--num-rollout
int
Total number of rollout steps (calculated from num_epoch if not set)
--num-epoch
int
Number of training epochs
--rollout-global-dataset
bool
default:"True"
Use global dataset for rollout (disable for custom data management)
--data-source-path
str
Data source class path
--prompt-data
str
Path to prompt dataset (JSONL format)
--apply-chat-template
bool
default:"False"
Apply chat template to prompts
--apply-chat-template-kwargs
dict
default:"{}"
Kwargs for chat template (JSON string)
--input-key
str
default:"input"
JSON key for input/prompt
--label-key
str
JSON key for labels/ground truth
--multimodal-keys
dict
JSON mapping of media types to data keys, e.g. {"image": "image_file"}
--metadata-key
str
default:"metadata"
JSON key for metadata
--tool-key
str
default:"tools"
JSON key for tools in function calling
Source: slime/utils/arguments.py:500-573

Batch Size Configuration

--rollout-batch-size
int
required
Number of prompts per rollout step
--n-samples-per-prompt
int
default:"1"
Number of responses per prompt
--global-batch-size
int
Global batch size for training (across all data parallel ranks)
--num-steps-per-rollout
int
Number of training steps per rollout (alternative to setting global-batch-size)
--micro-batch-size
int
default:"1"
Micro batch size per GPU (ignored if using dynamic batch size)
--balance-data
bool
default:"False"
Balance tokens across DP ranks using Karmarkar-Karp algorithm
--use-dynamic-batch-size
bool
default:"False"
Dynamically adjust micro batch size based on sequence lengths
--max-tokens-per-gpu
int
Maximum tokens per GPU for dynamic batching
Source: slime/utils/arguments.py:585-654

Evaluation Configuration

--eval-function-path
str
Path to eval function (defaults to rollout_function_path)
--eval-interval
int
Evaluation interval in rollout steps (None to disable)
--eval-prompt-data
list[str]
Eval dataset paths: dataset_name /path/to/data.jsonl ...
--eval-config
str
Path to YAML/JSON eval config (overrides --eval-prompt-data)
--skip-eval-before-train
bool
default:"False"
Skip initial evaluation before training
--eval-input-key
str
Override input key for eval
--eval-label-key
str
Override label key for eval
--n-samples-per-eval-prompt
int
default:"1"
Responses per eval prompt
--eval-temperature
float
Override temperature for eval
--eval-max-response-len
int
Override max response length for eval
Source: slime/utils/arguments.py:657-714

Algorithm Configuration

--ref-load
str
Reference model checkpoint path
--load
str
Actor checkpoint to load
--save
str
Checkpoint save directory
--save-interval
int
Checkpoint save interval
--save-hf
str
Save HuggingFace format checkpoints (Megatron backend)
--lr
float
default:"1e-6"
Learning rate
--clip-grad
float
default:"1.0"
Gradient clipping value
--seed
int
default:"1234"
Random seed
Critic Configuration:
--num-critic-only-steps
int
default:"0"
Number of critic-only training steps before actor training
--critic-load
str
Critic checkpoint path
--critic-save
str
Critic save directory
--critic-lr
float
Critic learning rate (defaults to --lr)
PPO Configuration:
--eps-clip
float
default:"0.2"
PPO clip range
--eps-clip-high
float
PPO upper clip range
--eps-clip-c
float
Dual-clip PPO lower bound
--value-clip
float
default:"0.2"
Value function clip range
--kl-coef
float
default:"0.0"
KL penalty coefficient for reward shaping
--loss-type
str
default:"policy_loss"
Loss type: “policy_loss”, “sft_loss”, or “custom_loss”
--advantage-estimator
str
default:"grpo"
Advantage estimator: “grpo”, “gspo”, “reinforce_plus_plus”, “ppo”
--gamma
float
default:"1.0"
PPO GAE gamma (discount factor)
--lambd
float
default:"1.0"
PPO GAE lambda
--normalize-advantages
bool
default:"False"
Normalize advantages
Source: slime/utils/arguments.py:718-963

Router Configuration

--use-slime-router
bool
default:"False"
Use SlimeRouter instead of SGLang router
--slime-router-middleware-paths
list[str]
default:"[]"
Middleware function paths
--slime-router-timeout
float
HTTP request timeout in seconds
--slime-router-max-connections
int
Maximum concurrent connections
--slime-router-health-check-failure-threshold
int
default:"3"
Consecutive failures before marking worker dead
Source: slime/utils/arguments.py:1009-1041

Logging Configuration

--use-wandb
bool
default:"False"
Enable Weights & Biases logging
--wandb-mode
str
W&B mode: “online”, “offline”, or “disabled”
--wandb-project
str
W&B project name
--wandb-team
str
W&B team/entity name
--wandb-group
str
W&B run group
--use-tensorboard
bool
default:"False"
Enable TensorBoard logging
--tb-project-name
str
TensorBoard log directory
Source: slime/utils/arguments.py:1043-1127

Example Configuration

python train.py \
  --actor-num-nodes 2 \
  --actor-num-gpus-per-node 8 \
  --rollout-num-gpus 8 \
  --rollout-batch-size 256 \
  --n-samples-per-prompt 4 \
  --global-batch-size 1024 \
  --hf-checkpoint Qwen/Qwen2.5-32B-Instruct \
  --prompt-data data/prompts.jsonl \
  --lr 1e-6 \
  --num-epoch 3 \
  --save checkpoints/my_run \
  --use-wandb \
  --wandb-project slime-training
See Also:

Build docs developers (and LLMs) love