Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/avnlp/llm-finetuning/llms.txt

Use this file to discover all available pages before exploring further.

Every pipeline in this repo is controlled by a single config.yaml file located next to the pipeline’s train.py. When train.py starts, it loads that file and builds a typed config object — no CLI flags, no environment overrides beyond what you put in the YAML. To change behaviour, edit the file (or copy it and pass the new path with --config). Fields you omit fall back to the loader’s class-level defaults.
supervised_finetuning uses the key split to select the dataset split. Every other module (math_reasoning, multi_hop_question_answering, medical_question_answering, preference_alignment) uses dataset_split instead. Using the wrong key silently falls back to the loader default.

Common fields

These fields appear in every config.yaml regardless of module.
FieldTypeDescription
model_idstrHuggingFace model identifier used to load the model and tokenizer.
output_dirstrPath where the trained model and tokenizer are saved after training.
learning_ratefloatOptimizer learning rate.
num_train_epochsintNumber of full passes over the training dataset.
per_device_train_batch_sizeintBatch size per GPU device.
gradient_accumulation_stepsintForward passes before each optimizer step. Effective batch size = per_device_train_batch_size × gradient_accumulation_steps.
logging_stepsintLog training metrics every N steps.
dataset_idstr | nullOptional HuggingFace dataset ID override. Omit to use the loader’s built-in default.
dataset_subsetstr | nullOptional HuggingFace dataset config/subset override. Omit to use the loader’s built-in default.

Fields by module type

Used by all five adapter methods in supervised_finetuning/: LoRA, QLoRA, DoRA, P-Tuning, and Prefix-Tuning.
FieldTypeDefaultDescription
splitstr"train"Dataset split to load. Note: this module uses split, not dataset_split.
save_strategystr"epoch"When to save checkpoints: "epoch" or "steps".

LoRA / QLoRA / DoRA

FieldTypeDefaultDescription
lora_rint8LoRA rank. Higher values add more trainable parameters and increase expressiveness.
lora_alphaint32LoRA scaling factor. Effective scale = lora_alpha / lora_r.
lora_dropoutfloat0.05Dropout applied to LoRA weight matrices during training.
use_doraboolfalseEnable DoRA (weight decomposition LoRA). Only used in DoRA configs.
target_moduleslist[str][q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]Transformer modules to apply LoRA adapters to.

P-Tuning

FieldTypeDefaultDescription
num_virtual_tokensint20Number of trainable soft prompt tokens prepended to the input sequence.
encoder_hidden_sizeint128Hidden size of the MLP encoder that generates the soft prompt embeddings.

Prefix-Tuning

FieldTypeDefaultDescription
num_virtual_tokensint20Number of prefix tokens prepended at each transformer layer.

Example configs

model_id: "meta-llama/Llama-3.2-3B"
dataset_name: "allenai/ai2_arc"
dataset_config: "ARC-Challenge"
split: "train"
output_dir: "./outputs/supervised_finetuning/lora/arc"

num_train_epochs: 3
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 2.0e-4
save_strategy: "epoch"
logging_steps: 10

lora_r: 8
lora_alpha: 32
lora_dropout: 0.05
target_modules:
  - q_proj
  - k_proj
  - v_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj

Common overrides

Change the base model

Update model_id in config.yaml to any HuggingFace model identifier:
model_id: "mistralai/Mistral-7B-Instruct-v0.3"
The tokenizer is loaded from the same identifier, so no other change is needed.

Override the dataset

All three dataset keys can be set independently. Any key you omit preserves the loader’s built-in default.
dataset_id: "allenai/ai2_arc"
dataset_subset: "ARC-Easy"
split: "train"

Run with a different config file

Pass the --config flag to any pipeline’s train.py:
python train.py --config config_mistral7b.yaml

Build docs developers (and LLMs) love