TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/QwenLM/Qwen3-ASR/llms.txt
Use this file to discover all available pages before exploring further.
qwen-asr package ships two ready-to-run web demos: a full-featured Gradio interface (qwen-asr-demo) and a minimal Flask streaming demo (qwen-asr-demo-streaming). Both let you transcribe audio from a browser without writing any application code.
Gradio Demo
Theqwen-asr-demo command launches a Gradio web UI backed by either the transformers or vLLM inference engine. It supports file uploads, optional timestamp visualization, and HTTPS.
Basic Usage
http://<your-ip>:8000 in a browser, or use port forwarding in VS Code.
Flag Reference
| Flag | Default | Description |
|---|---|---|
--asr-checkpoint | (required) | Qwen3-ASR model checkpoint path or Hugging Face repo ID. |
--aligner-checkpoint | None | Qwen3-ForcedAligner checkpoint (enables timestamps when provided). |
--backend | transformers | Inference backend: transformers or vllm. |
--cuda-visible-devices | 0 | GPU index to expose to the demo process. |
--backend-kwargs | None | JSON dict of backend-specific init arguments. |
--aligner-kwargs | None | JSON dict of forced aligner init arguments. |
--ip | 0.0.0.0 | Server bind address. |
--port | 8000 | Server port. |
--ssl-certfile | None | Path to SSL certificate file (enables HTTPS). |
--ssl-keyfile | None | Path to SSL private key file (enables HTTPS). |
--no-ssl-verify | — | Disable SSL certificate verification (useful for self-signed certs). |
--share | false | Create a public Gradio share link (disabled by default). |
--concurrency | 16 | Gradio queue concurrency limit. |
Choosing a Backend
All backend-specific initialization parameters are passed via--backend-kwargs as a JSON string. If not provided, the demo uses sensible defaults.
- Transformers Backend
- vLLM Backend
The transformers backend is the simplest to set up and is recommended for development or single-GPU use.Override init arguments with
--backend-kwargs:Enabling Timestamps
Word- and character-level timestamps are available when--aligner-checkpoint is provided. The Gradio UI will show a timestamp visualization panel automatically; without the flag it is hidden.
HTTPS Setup
Modern browsers block microphone access on non-HTTPS pages when the origin is notlocalhost. To record audio remotely, serve the demo over HTTPS.
Microphone access requires a secure context (HTTPS or
localhost). If you access the demo from a remote machine without HTTPS, the browser will silently deny permission and recording will not work.Generate a self-signed certificate
Create a private key and a self-signed certificate valid for 365 days:
Streaming Demo
Theqwen-asr-demo-streaming command launches a minimal Flask-based demo that captures microphone audio in the browser, resamples it to 16,000 Hz, and continuously pushes PCM chunks to the model for real-time transcription.
http://<your-ip>:8000.
Streaming Demo Flags
| Flag | Default | Description |
|---|---|---|
--asr-model-path | Qwen/Qwen3-ASR-1.7B | Model name or local path. |
--gpu-memory-utilization | 0.8 | vLLM GPU memory fraction (0.0–1.0). |
--host | 0.0.0.0 | Bind host for the Flask server. |
--port | 8000 | Bind port for the Flask server. |
The streaming demo uses the vLLM backend exclusively and requires
pip install -U "qwen-asr[vllm]". Streaming inference does not support batch processing or timestamp output.CUDA Device Selection
Because vLLM does not respect thecuda:N device-string style, both demos control GPU selection by setting the CUDA_VISIBLE_DEVICES environment variable. Use --cuda-visible-devices to choose which physical GPU the process sees:
transformers and vllm backends in qwen-asr-demo.