This guide walks you through installing theDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/QwenLM/Qwen3-ASR/llms.txt
Use this file to discover all available pages before exploring further.
qwen-asr package, loading a model, and running your first transcription — all in about 5 minutes. By the end you will have working code that accepts any audio URL or local file, automatically detects the spoken language, and returns the full transcript. The same API works for both the lightweight 0.6B and the flagship 1.7B checkpoints.
Qwen3-ASR requires a CUDA-capable GPU. The 1.7B model fits comfortably in 8 GB of VRAM with
torch.bfloat16. The 0.6B model runs in around 3 GB of VRAM under the same dtype.Steps
Install the package
Install If you want the vLLM backend for faster batch inference and streaming support, install the optional extra instead:
qwen-asr from PyPI. The base installation pulls in the HuggingFace Transformers backend and all required runtime dependencies.Load a model
Choose either the Transformers backend (simple, single-GPU) or the vLLM backend (high-throughput, streaming). Both expose the same Model weights are downloaded automatically from HuggingFace on first use. For offline environments, see Downloading model weights manually.
model.transcribe(...) interface.Transcribe audio
Call
model.transcribe() with a URL, local path, base64 string, or a (np.ndarray, sr) tuple. Pass a list to run batch inference.Parsing raw vLLM server output
When you query a deployed vLLM server directly via HTTP (for example, through the OpenAI chat completions endpoint), the model returns a raw string. Useparse_asr_output to split it into (language, text):
Next Steps
Transformers Backend
Deep-dive into batch inference, FlashAttention 2, and timestamp extraction with the Transformers backend.
vLLM Backend
Configure GPU memory utilisation, async serving, and the OpenAI-compatible REST API.
Forced Aligner
Add word- and character-level timestamps to any transcription.
Installation
Set up conda environments, install from source, and manage model weights.