TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/QwenLM/Qwen3-ASR/llms.txt
Use this file to discover all available pages before exploring further.
finetuning/qwen3_asr_sft.py script drives the entire fine-tuning loop. It loads a Qwen3-ASR checkpoint, wraps it with Hugging Face Trainer, and saves fully self-contained checkpoints that can be used for inference without any additional steps. This page walks through the complete training workflow, covers every command-line argument, and provides a ready-to-run shell script.
Training Workflow
Prepare your JSONL data
Create a training file (and optionally a validation file) in the JSONL format described on the Data Format page.
Choose single-GPU or multi-GPU
For a single GPU use
python qwen3_asr_sft.py. For multiple GPUs, use torchrun --nproc_per_node=N. See the commands below.Monitor training
Loss and other metrics are printed every
--log_steps steps. Checkpoints are written to {output_dir}/checkpoint-{global_step} every --save_steps steps.Resume if interrupted
Pass
--resume 1 to automatically pick up from the latest checkpoint in output_dir, or --resume_from ./path/to/checkpoint to resume from a specific one.Run inference on your checkpoint
Load any saved checkpoint directly with
Qwen3ASRModel.from_pretrained. See Overview — Quick Inference After Fine-Tuning.Launch Commands
- Single GPU
- Multi-GPU (torchrun)
Run the script directly with Checkpoints are written to
python. This is the simplest setup and requires no distributed configuration../qwen3-asr-finetuning-out/checkpoint-<global_step>.Resuming Training
If a training run is interrupted, you can resume from any saved checkpoint without losing progress.- Explicit checkpoint path
- Auto-resume (latest checkpoint)
Point
--resume_from at a specific checkpoint directory:Training Arguments Reference
Paths
Path to a local model directory or a Hugging Face Hub repository ID. The script calls
Qwen3ASRModel.from_pretrained with this value, so any valid Hub ID or local path works.Path to the JSONL training file. Each line must contain
audio and text fields. Required.Optional path to a JSONL evaluation file in the same format as
--train_file. When provided, validation loss is computed every --save_steps steps.Directory where checkpoints are written. Each checkpoint is saved as
{output_dir}/checkpoint-{global_step} and contains the model weights plus all files needed for inference.Audio
Target sample rate in Hz for audio loading. All WAV files are resampled to this rate by
librosa before being passed to the model’s processor. The Qwen3-ASR processor expects 16,000 Hz, so this value should not normally be changed.Training Hyperparameters
Per-device training batch size. This is the number of samples processed on each GPU per forward-backward pass.
Gradient accumulation steps. Gradients are accumulated over this many mini-batches before an optimiser step, effectively multiplying the batch size without increasing memory usage.
Peak learning rate for the AdamW optimiser. The scheduler type defaults to
linear with a warm-up ratio of 0.02.Number of training epochs. Fractional values are accepted (e.g.,
0.5 for half an epoch).Log training metrics every N global steps.
Learning rate scheduler type. Passed directly to
TrainingArguments. Common values: "linear", "cosine", "constant".Fraction of total training steps used for linear learning-rate warm-up.
Checkpoint Settings
When to save checkpoints.
"steps" saves every --save_steps global steps. Other values accepted by Hugging Face TrainingArguments are also valid.Save a checkpoint (and run evaluation, if
--eval_file is provided) every N global steps.Maximum number of checkpoints to keep on disk. Older checkpoints are deleted when this limit is exceeded.
Resuming
Explicit path to a checkpoint directory to resume from. Takes precedence over
--resume.Set to
1 to automatically resume from the latest checkpoint found inside --output_dir. Ignored if --resume_from is also set.DataLoader Performance Options
These flags control the PyTorchDataLoader used during training. Tuning them can improve GPU utilisation, especially when audio loading is the bottleneck.
Number of worker processes for the DataLoader. Increase this if CPU-side audio loading is a bottleneck. Set to
0 to load data in the main process (useful for debugging).Set to
1 to enable pinned (page-locked) memory for faster host-to-device transfers. Disable (0) if you experience memory pressure.Set to
1 to keep worker processes alive between epochs, avoiding the overhead of relaunching them. Requires --num_workers > 0.Number of batches each worker prefetches. Higher values reduce idle GPU time but increase memory usage. Has no effect when
--num_workers is 0.One-Click Shell Script
The following self-contained script mirrors the full multi-GPU example from the fine-tuning README and sets all recommended DataLoader flags. Save it asrun_finetune.sh, make it executable, and run it directly.