Overview
Slime uses a comprehensive argument system built on top of Megatron-LM arguments with extensive RL-specific additions.Parsing Arguments
parse_args()
Main entry point for parsing all arguments.Namespace object containing all parsed arguments
Source: slime/utils/arguments.py:1, defined via get_slime_extra_args_provider()
Argument Categories
Cluster Configuration
Arguments for distributed training setup.Number of nodes for actor training
GPUs per node for actor training
Number of nodes for critic training (if using critic)
GPUs per node for critic training
Total GPUs for rollout inference (ignored if
--colocate)GPUs per inference engine (tensor parallel size)
GPUs per node for resource allocation
Colocate inference engines and actor on same GPUs (enables offloading)
Enable both
--offload-train and --offload-rolloutOffload training model to CPU during rollout
Offload rollout model to CPU during training
slime/utils/arguments.py:37-108
Training Configuration
QKV layout for Megatron: “thd” or “bshd”
Enable strict on-policy training
Extra environment variables for training (JSON string)
Memory margin for training allocation (default: 1GB)
Enable weights backuper (disable to save host memory)
Megatron to HuggingFace conversion: “raw” or “bridge”
Path to custom model provider function
Recompute loss function to save memory
Chunk size for log prob computation
Regex patterns of parameters to train (freeze all others)
Regex patterns of parameters to freeze
slime/utils/arguments.py:111-214
Rollout Configuration
HuggingFace checkpoint path for rollout model
Model name for weight conversion (auto-detected if not set)
Path to rollout generation function
Sampling temperature
Top-p (nucleus) sampling
Top-k sampling (-1 for disabled)
Maximum context length (must not exceed model’s max position embeddings)
Maximum prompt length (filters long prompts at init)
Maximum response length (max_tokens in SGLang)
Skip special tokens in decoded response
Stop strings for generation
Stop token IDs for generation
Shuffle prompts during rollout
Random seed for rollout
slime/utils/arguments.py:218-339
Sampling Configuration
Sampling batch granularity (defaults to
rollout_batch_size)Path to dynamic sampling filter function
Enable partial rollout with sample recycling
Mask off-policy tokens in partial rollout
Path to custom generate function
Path to custom rollout logging function
Path to buffer filter function
slime/utils/arguments.py:341-426
Data Configuration
Total number of rollout steps (calculated from
num_epoch if not set)Number of training epochs
Use global dataset for rollout (disable for custom data management)
Data source class path
Path to prompt dataset (JSONL format)
Apply chat template to prompts
Kwargs for chat template (JSON string)
JSON key for input/prompt
JSON key for labels/ground truth
JSON mapping of media types to data keys, e.g.
{"image": "image_file"}JSON key for metadata
JSON key for tools in function calling
slime/utils/arguments.py:500-573
Batch Size Configuration
Number of prompts per rollout step
Number of responses per prompt
Global batch size for training (across all data parallel ranks)
Number of training steps per rollout (alternative to setting global-batch-size)
Micro batch size per GPU (ignored if using dynamic batch size)
Balance tokens across DP ranks using Karmarkar-Karp algorithm
Dynamically adjust micro batch size based on sequence lengths
Maximum tokens per GPU for dynamic batching
slime/utils/arguments.py:585-654
Evaluation Configuration
Path to eval function (defaults to
rollout_function_path)Evaluation interval in rollout steps (None to disable)
Eval dataset paths:
dataset_name /path/to/data.jsonl ...Path to YAML/JSON eval config (overrides
--eval-prompt-data)Skip initial evaluation before training
Override input key for eval
Override label key for eval
Responses per eval prompt
Override temperature for eval
Override max response length for eval
slime/utils/arguments.py:657-714
Algorithm Configuration
Reference model checkpoint path
Actor checkpoint to load
Checkpoint save directory
Checkpoint save interval
Save HuggingFace format checkpoints (Megatron backend)
Learning rate
Gradient clipping value
Random seed
Number of critic-only training steps before actor training
Critic checkpoint path
Critic save directory
Critic learning rate (defaults to
--lr)PPO clip range
PPO upper clip range
Dual-clip PPO lower bound
Value function clip range
KL penalty coefficient for reward shaping
Loss type: “policy_loss”, “sft_loss”, or “custom_loss”
Advantage estimator: “grpo”, “gspo”, “reinforce_plus_plus”, “ppo”
PPO GAE gamma (discount factor)
PPO GAE lambda
Normalize advantages
slime/utils/arguments.py:718-963
Router Configuration
Use SlimeRouter instead of SGLang router
Middleware function paths
HTTP request timeout in seconds
Maximum concurrent connections
Consecutive failures before marking worker dead
slime/utils/arguments.py:1009-1041
Logging Configuration
Enable Weights & Biases logging
W&B mode: “online”, “offline”, or “disabled”
W&B project name
W&B team/entity name
W&B run group
Enable TensorBoard logging
TensorBoard log directory
slime/utils/arguments.py:1043-1127
Example Configuration
- Training API - Using parsed arguments
- Rollout API - Rollout configuration
- Logging API - Tracking configuration