Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/QwenLM/Qwen3-ASR/llms.txt

Use this file to discover all available pages before exploring further.

The finetuning/qwen3_asr_sft.py script drives the entire fine-tuning loop. It loads a Qwen3-ASR checkpoint, wraps it with Hugging Face Trainer, and saves fully self-contained checkpoints that can be used for inference without any additional steps. This page walks through the complete training workflow, covers every command-line argument, and provides a ready-to-run shell script.

Training Workflow

1

Prepare your JSONL data

Create a training file (and optionally a validation file) in the JSONL format described on the Data Format page.
2

Choose single-GPU or multi-GPU

For a single GPU use python qwen3_asr_sft.py. For multiple GPUs, use torchrun --nproc_per_node=N. See the commands below.
3

Monitor training

Loss and other metrics are printed every --log_steps steps. Checkpoints are written to {output_dir}/checkpoint-{global_step} every --save_steps steps.
4

Resume if interrupted

Pass --resume 1 to automatically pick up from the latest checkpoint in output_dir, or --resume_from ./path/to/checkpoint to resume from a specific one.
5

Run inference on your checkpoint

Load any saved checkpoint directly with Qwen3ASRModel.from_pretrained. See Overview — Quick Inference After Fine-Tuning.

Launch Commands

Run the script directly with python. This is the simplest setup and requires no distributed configuration.
python qwen3_asr_sft.py \
  --model_path Qwen/Qwen3-ASR-1.7B \
  --train_file ./train.jsonl \
  --output_dir ./qwen3-asr-finetuning-out \
  --batch_size 32 \
  --grad_acc 4 \
  --lr 2e-5 \
  --epochs 1 \
  --save_steps 200 \
  --save_total_limit 5
Checkpoints are written to ./qwen3-asr-finetuning-out/checkpoint-<global_step>.

Resuming Training

If a training run is interrupted, you can resume from any saved checkpoint without losing progress.
Point --resume_from at a specific checkpoint directory:
python qwen3_asr_sft.py \
  --train_file ./train.jsonl \
  --output_dir ./qwen3-asr-finetuning-out \
  --resume_from ./qwen3-asr-finetuning-out/checkpoint-200

Training Arguments Reference

Paths

--model_path
string
default:"Qwen/Qwen3-ASR-1.7B"
Path to a local model directory or a Hugging Face Hub repository ID. The script calls Qwen3ASRModel.from_pretrained with this value, so any valid Hub ID or local path works.
--train_file
string
default:"train.jsonl"
Path to the JSONL training file. Each line must contain audio and text fields. Required.
--eval_file
string
default:""
Optional path to a JSONL evaluation file in the same format as --train_file. When provided, validation loss is computed every --save_steps steps.
--output_dir
string
default:"./qwen3-asr-finetuning-out"
Directory where checkpoints are written. Each checkpoint is saved as {output_dir}/checkpoint-{global_step} and contains the model weights plus all files needed for inference.

Audio

--sr
int
default:"16000"
Target sample rate in Hz for audio loading. All WAV files are resampled to this rate by librosa before being passed to the model’s processor. The Qwen3-ASR processor expects 16,000 Hz, so this value should not normally be changed.

Training Hyperparameters

--batch_size
int
default:"32"
Per-device training batch size. This is the number of samples processed on each GPU per forward-backward pass.
--grad_acc
int
default:"4"
Gradient accumulation steps. Gradients are accumulated over this many mini-batches before an optimiser step, effectively multiplying the batch size without increasing memory usage.
--lr
float
default:"2e-5"
Peak learning rate for the AdamW optimiser. The scheduler type defaults to linear with a warm-up ratio of 0.02.
--epochs
float
default:"1"
Number of training epochs. Fractional values are accepted (e.g., 0.5 for half an epoch).
--log_steps
int
default:"10"
Log training metrics every N global steps.
--lr_scheduler_type
string
default:"linear"
Learning rate scheduler type. Passed directly to TrainingArguments. Common values: "linear", "cosine", "constant".
--warmup_ratio
float
default:"0.02"
Fraction of total training steps used for linear learning-rate warm-up.

Checkpoint Settings

--save_strategy
string
default:"steps"
When to save checkpoints. "steps" saves every --save_steps global steps. Other values accepted by Hugging Face TrainingArguments are also valid.
--save_steps
int
default:"200"
Save a checkpoint (and run evaluation, if --eval_file is provided) every N global steps.
--save_total_limit
int
default:"5"
Maximum number of checkpoints to keep on disk. Older checkpoints are deleted when this limit is exceeded.

Resuming

--resume_from
string
default:""
Explicit path to a checkpoint directory to resume from. Takes precedence over --resume.
--resume
int
default:"0"
Set to 1 to automatically resume from the latest checkpoint found inside --output_dir. Ignored if --resume_from is also set.

DataLoader Performance Options

These flags control the PyTorch DataLoader used during training. Tuning them can improve GPU utilisation, especially when audio loading is the bottleneck.
--num_workers
int
default:"4"
Number of worker processes for the DataLoader. Increase this if CPU-side audio loading is a bottleneck. Set to 0 to load data in the main process (useful for debugging).
--pin_memory
int
default:"1"
Set to 1 to enable pinned (page-locked) memory for faster host-to-device transfers. Disable (0) if you experience memory pressure.
--persistent_workers
int
default:"1"
Set to 1 to keep worker processes alive between epochs, avoiding the overhead of relaunching them. Requires --num_workers > 0.
--prefetch_factor
int
default:"2"
Number of batches each worker prefetches. Higher values reduce idle GPU time but increase memory usage. Has no effect when --num_workers is 0.

One-Click Shell Script

The following self-contained script mirrors the full multi-GPU example from the fine-tuning README and sets all recommended DataLoader flags. Save it as run_finetune.sh, make it executable, and run it directly.
#!/usr/bin/env bash
set -e

export CUDA_VISIBLE_DEVICES=0,1

MODEL_PATH="Qwen/Qwen3-ASR-1.7B"
TRAIN_FILE="./train.jsonl"
EVAL_FILE="./eval.jsonl"
OUTPUT_DIR="./qwen3-asr-finetuning-out"

torchrun --nproc_per_node=2 qwen3_asr_sft.py \
  --model_path ${MODEL_PATH} \
  --train_file ${TRAIN_FILE} \
  --eval_file ${EVAL_FILE} \
  --output_dir ${OUTPUT_DIR} \
  --batch_size 32 \
  --grad_acc 4 \
  --lr 2e-5 \
  --epochs 1 \
  --log_steps 10 \
  --save_strategy steps \
  --save_steps 200 \
  --save_total_limit 5 \
  --num_workers 2 \
  --pin_memory 1 \
  --persistent_workers 1 \
  --prefetch_factor 2
Remove the --eval_file line if you do not have a validation set. The script will skip evaluation steps and only report training loss.

Build docs developers (and LLMs) love