Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/QwenLM/Qwen3-ASR/llms.txt

Use this file to discover all available pages before exploring further.

The qwen-asr package installs three command-line entry points that cover the main deployment scenarios: an interactive Gradio web UI, a minimal Flask-based streaming demo, and a vLLM-powered inference server. This page documents every flag and provides ready-to-run examples for each command.
All three commands are installed automatically when you run pip install qwen-asr. The vLLM backend (qwen-asr-demo --backend vllm and qwen-asr-serve) additionally requires pip install qwen-asr[vllm].

qwen-asr-demo

Launches a Gradio web UI demo backed by either the transformers or vllm inference backend. The UI lets users upload audio, choose a language, and optionally enable timestamp visualization when a ForcedAligner checkpoint is provided.

Flag Reference

--asr-checkpoint
string
required
Path to a local model directory or a HuggingFace repository ID for the Qwen3-ASR model.Example: Qwen/Qwen3-ASR-1.7B or ./Qwen3-ASR-1.7B
--aligner-checkpoint
string
Path to a local directory or HuggingFace repository ID for the Qwen3-ForcedAligner-0.6B model. Optional. When provided, the UI displays a timestamps panel and a visualization button.Example: Qwen/Qwen3-ForcedAligner-0.6B
--backend
string
default:"transformers"
Inference backend for the ASR model. Accepted values: transformers, vllm.
--cuda-visible-devices
string
default:"0"
Sets CUDA_VISIBLE_DEVICES for the demo process. Use a comma-separated list of GPU indices (e.g., 0 or 1). Because vLLM does not follow the cuda:0 device selection style, this flag is the recommended way to control which GPU is used.
--backend-kwargs
string
JSON dict of backend-specific keyword arguments passed to the model loader, excluding the checkpoint path. Merged over sensible defaults.Transformers default: {"device_map": "cuda:0", "dtype": "bfloat16", "max_inference_batch_size": 4, "max_new_tokens": 512}vLLM default: {"gpu_memory_utilization": 0.8, "max_inference_batch_size": 4, "max_new_tokens": 4096}
--aligner-kwargs
string
JSON dict of keyword arguments for the ForcedAligner model loader. Only used when --aligner-checkpoint is set.Default: {"dtype": "bfloat16", "device_map": "cuda:0"}
--ip
string
default:"0.0.0.0"
Server bind IP address for the Gradio server.
--port
integer
default:"8000"
Server port for the Gradio server.
--concurrency
integer
default:"16"
Gradio queue concurrency limit — the maximum number of requests processed simultaneously.
--share / --no-share
boolean
default:"false"
Whether to create a public Gradio sharing link. Disabled by default.
--ssl-certfile
string
Path to an SSL certificate file (PEM format) for serving over HTTPS. Required to avoid browser microphone permission issues when accessed remotely.
--ssl-keyfile
string
Path to the SSL private key file (PEM format) matching --ssl-certfile.
--ssl-verify / --no-ssl-verify
boolean
default:"true"
Whether to verify the SSL certificate. Pass --no-ssl-verify when using a self-signed certificate.

Usage Examples

qwen-asr-demo \
  --asr-checkpoint Qwen/Qwen3-ASR-1.7B \
  --backend transformers \
  --cuda-visible-devices 0 \
  --ip 0.0.0.0 --port 8000
After launching, open http://<your-ip>:8000 (or https:// when using SSL) in your browser, or use port forwarding in VS Code to access it locally.
Timestamps are only shown in the UI when --aligner-checkpoint is provided. Without it, the timestamps panel is hidden automatically.

Build docs developers (and LLMs) love