TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/QwenLM/Qwen3-ASR/llms.txt
Use this file to discover all available pages before exploring further.
qwen-asr package installs three command-line entry points that cover the main deployment scenarios: an interactive Gradio web UI, a minimal Flask-based streaming demo, and a vLLM-powered inference server. This page documents every flag and provides ready-to-run examples for each command.
All three commands are installed automatically when you run
pip install qwen-asr. The vLLM backend (qwen-asr-demo --backend vllm and qwen-asr-serve) additionally requires pip install qwen-asr[vllm].- qwen-asr-demo
- qwen-asr-demo-streaming
- qwen-asr-serve
qwen-asr-demo
Launches a Gradio web UI demo backed by either thetransformers or vllm inference backend. The UI lets users upload audio, choose a language, and optionally enable timestamp visualization when a ForcedAligner checkpoint is provided.Flag Reference
Path to a local model directory or a HuggingFace repository ID for the Qwen3-ASR model.Example:
Qwen/Qwen3-ASR-1.7B or ./Qwen3-ASR-1.7BPath to a local directory or HuggingFace repository ID for the
Qwen3-ForcedAligner-0.6B model. Optional. When provided, the UI displays a timestamps panel and a visualization button.Example: Qwen/Qwen3-ForcedAligner-0.6BInference backend for the ASR model. Accepted values:
transformers, vllm.Sets
CUDA_VISIBLE_DEVICES for the demo process. Use a comma-separated list of GPU indices (e.g., 0 or 1). Because vLLM does not follow the cuda:0 device selection style, this flag is the recommended way to control which GPU is used.JSON dict of backend-specific keyword arguments passed to the model loader, excluding the checkpoint path. Merged over sensible defaults.Transformers default:
{"device_map": "cuda:0", "dtype": "bfloat16", "max_inference_batch_size": 4, "max_new_tokens": 512}vLLM default: {"gpu_memory_utilization": 0.8, "max_inference_batch_size": 4, "max_new_tokens": 4096}JSON dict of keyword arguments for the ForcedAligner model loader. Only used when
--aligner-checkpoint is set.Default: {"dtype": "bfloat16", "device_map": "cuda:0"}Server bind IP address for the Gradio server.
Server port for the Gradio server.
Gradio queue concurrency limit — the maximum number of requests processed simultaneously.
Whether to create a public Gradio sharing link. Disabled by default.
Path to an SSL certificate file (PEM format) for serving over HTTPS. Required to avoid browser microphone permission issues when accessed remotely.
Path to the SSL private key file (PEM format) matching
--ssl-certfile.Whether to verify the SSL certificate. Pass
--no-ssl-verify when using a self-signed certificate.Usage Examples
http://<your-ip>:8000 (or https:// when using SSL) in your browser, or use port forwarding in VS Code to access it locally.Timestamps are only shown in the UI when
--aligner-checkpoint is provided. Without it, the timestamps panel is hidden automatically.