Documentation Index Fetch the complete documentation index at: https://mintlify.com/avnlp/llm-finetuning/llms.txt
Use this file to discover all available pages before exploring further.
Every pipeline in this repo is controlled by a single config.yaml file located next to the pipeline’s train.py. When train.py starts, it loads that file and builds a typed config object — no CLI flags, no environment overrides beyond what you put in the YAML. To change behaviour, edit the file (or copy it and pass the new path with --config). Fields you omit fall back to the loader’s class-level defaults.
supervised_finetuning uses the key split to select the dataset split. Every other module (math_reasoning, multi_hop_question_answering, medical_question_answering, preference_alignment) uses dataset_split instead. Using the wrong key silently falls back to the loader default.
Common fields
These fields appear in every config.yaml regardless of module.
Field Type Description model_idstrHuggingFace model identifier used to load the model and tokenizer. output_dirstrPath where the trained model and tokenizer are saved after training. learning_ratefloatOptimizer learning rate. num_train_epochsintNumber of full passes over the training dataset. per_device_train_batch_sizeintBatch size per GPU device. gradient_accumulation_stepsintForward passes before each optimizer step. Effective batch size = per_device_train_batch_size × gradient_accumulation_steps. logging_stepsintLog training metrics every N steps. dataset_idstr | nullOptional HuggingFace dataset ID override. Omit to use the loader’s built-in default. dataset_subsetstr | nullOptional HuggingFace dataset config/subset override. Omit to use the loader’s built-in default.
Fields by module type
SFT fields
GRPO fields
Preference alignment
Used by all five adapter methods in supervised_finetuning/: LoRA, QLoRA, DoRA, P-Tuning, and Prefix-Tuning. Field Type Default Description splitstr"train"Dataset split to load. Note : this module uses split, not dataset_split. save_strategystr"epoch"When to save checkpoints: "epoch" or "steps".
LoRA / QLoRA / DoRA Field Type Default Description lora_rint8LoRA rank. Higher values add more trainable parameters and increase expressiveness. lora_alphaint32LoRA scaling factor. Effective scale = lora_alpha / lora_r. lora_dropoutfloat0.05Dropout applied to LoRA weight matrices during training. use_doraboolfalseEnable DoRA (weight decomposition LoRA). Only used in DoRA configs. target_moduleslist[str][q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]Transformer modules to apply LoRA adapters to.
P-Tuning Field Type Default Description num_virtual_tokensint20Number of trainable soft prompt tokens prepended to the input sequence. encoder_hidden_sizeint128Hidden size of the MLP encoder that generates the soft prompt embeddings.
Prefix-Tuning Field Type Default Description num_virtual_tokensint20Number of prefix tokens prepended at each transformer layer.
Used by math_reasoning/grpo/, multi_hop_question_answering/grpo/, and medical_question_answering/grpo/. Field Type Default Description max_seq_lengthint2048Maximum sequence length passed to FastLanguageModel.from_pretrained. dataset_splitstr"train"Dataset split passed to loader.load(...). num_generationsint4Completions generated per prompt. GRPO compares these to compute relative rewards. max_prompt_lengthint512Maximum number of tokens in the prompt. Longer prompts are truncated. max_completion_lengthint512Maximum number of tokens in each generated completion. max_grad_normfloat0.1Gradient clipping norm. Used by math_reasoning only. lora_rint8QLoRA rank. lora_alphaint8QLoRA scaling factor. target_moduleslist[str][q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]Modules to apply QLoRA adapters to.
Used by all pipelines under preference_alignment/. Each sub-trainer has its own field set. DPO Field Type Default Description max_seq_lengthint4096Maximum sequence length. dataset_splitstr"train_prefs"Dataset split. Default varies by dataset: train_prefs for UltraFeedback, train for WebGPT. dpo_betafloat0.1KL divergence penalty coefficient. Higher values keep the policy closer to the reference model. lora_rint64QLoRA rank (higher than GRPO to support larger preference datasets). lora_alphaint64QLoRA scaling factor. target_moduleslist[str]same as GRPO Modules to apply QLoRA adapters to.
ORPO Field Type Default Description max_seq_lengthint4096Maximum sequence length. dataset_splitstr"train_prefs"Dataset split passed to loader.load(...). orpo_betafloat0.1ORPO odds-ratio penalty coefficient. lora_rint16QLoRA rank. lora_alphaint16QLoRA scaling factor. target_moduleslist[str]same as GRPO Modules to apply QLoRA adapters to.
KTO Field Type Default Description max_seq_lengthint4096Maximum sequence length. dataset_splitstr"train"Dataset split passed to loader.load(...). lora_rint16QLoRA rank. lora_alphaint16QLoRA scaling factor. target_moduleslist[str]same as GRPO Modules to apply QLoRA adapters to.
PPO Field Type Default Description dataset_splitstr"train"Dataset split. Default varies: train_prefs for UltraFeedback, train for WebGPT. batch_sizeint64Total rollout batch size. mini_batch_sizeint1Mini-batch size for the PPO optimization step. ppo_epochsint4Number of optimization epochs per rollout batch. max_new_tokensint128Maximum tokens to generate per step in the rollout loop.
Example configs
SFT LoRA (arc)
GRPO (gsm8k)
model_id : "meta-llama/Llama-3.2-3B"
dataset_name : "allenai/ai2_arc"
dataset_config : "ARC-Challenge"
split : "train"
output_dir : "./outputs/supervised_finetuning/lora/arc"
num_train_epochs : 3
per_device_train_batch_size : 1
gradient_accumulation_steps : 8
learning_rate : 2.0e-4
save_strategy : "epoch"
logging_steps : 10
lora_r : 8
lora_alpha : 32
lora_dropout : 0.05
target_modules :
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj
Common overrides
Change the base model
Update model_id in config.yaml to any HuggingFace model identifier:
model_id : "mistralai/Mistral-7B-Instruct-v0.3"
The tokenizer is loaded from the same identifier, so no other change is needed.
Override the dataset
All three dataset keys can be set independently. Any key you omit preserves the loader’s built-in default.
dataset_id : "allenai/ai2_arc"
dataset_subset : "ARC-Easy"
split : "train"
dataset_id : "openai/gsm8k"
dataset_subset : "main"
dataset_split : "train"
Run with a different config file
Pass the --config flag to any pipeline’s train.py:
python train.py --config config_mistral7b.yaml