Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/QwenLM/Qwen3-ASR/llms.txt

Use this file to discover all available pages before exploring further.

The Qwen team publishes a pre-built Docker image — qwenllm/qwen3-asr — that bundles the qwen-asr package, vLLM, FlashAttention 2, and all system dependencies. You only need to install the NVIDIA GPU driver and download your model weights; the container handles everything else.

Prerequisites

Before pulling the image, make sure the following are installed on the host machine: Verify GPU access is available to Docker before proceeding:
docker run --rm --gpus all nvidia/cuda:12.0-base-ubuntu22.04 nvidia-smi

Setup Flow

1

Set environment variables

Define your local workspace directory and the host port you want to expose. Replace /path/to/your/workspace with the actual path on your host machine.
LOCAL_WORKDIR=/path/to/your/workspace
HOST_PORT=8000
CONTAINER_PORT=80
2

Pull the image and start a container

Run the following command to pull qwenllm/qwen3-asr:latest and launch an interactive container:
docker run --gpus all --name qwen3-asr \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -p $HOST_PORT:$CONTAINER_PORT \
    --mount type=bind,source=$LOCAL_WORKDIR,target=/data/shared/Qwen3-ASR \
    --shm-size=4gb \
    -it qwenllm/qwen3-asr:latest
After the image is pulled and the container starts, you are dropped into a bash shell inside the container.
3

Run your code

Your local workspace is mounted at /data/shared/Qwen3-ASR inside the container. You can run any qwen-asr command or Python script from there. See Running the Gradio Demo below for an example.

Port and Volume Mapping

MappingDescription
-p $HOST_PORT:$CONTAINER_PORTMaps host port 8000 → container port 80. Services inside must bind to port 80 to be reachable at http://<host-ip>:8000.
--mount type=bind,source=$LOCAL_WORKDIR,target=/data/shared/Qwen3-ASRMounts your local directory inside the container so scripts, downloaded models, and outputs are shared between host and container.
--shm-size=4gbIncreases shared memory, which vLLM and PyTorch require for large batch workloads.
Services inside the container must bind to 0.0.0.0, not 127.0.0.1. If a service binds only to the loopback interface, the port mapping will not forward traffic from the host.

Re-Entering the Container

If you exit the shell, the container is stopped but not removed. Restart it and open a new shell with:
docker start qwen3-asr
docker exec -it qwen3-asr bash

Removing the Container

To delete the container completely (data in the mounted volume is unaffected):
docker rm -f qwen3-asr

Running the Gradio Demo Inside Docker

Once inside the container, launch the Gradio demo bound to port 80 (which is mapped to your chosen host port):
qwen-asr-demo \
  --asr-checkpoint Qwen/Qwen3-ASR-1.7B \
  --backend vllm \
  --cuda-visible-devices 0 \
  --ip 0.0.0.0 --port 80
Then open http://<host-ip>:8000 in your browser (substituting the HOST_PORT you chose).
If you are in Mainland China and experience slow pulls from Docker Hub, configure a registry mirror to accelerate the image download. Refer to your Docker daemon documentation for mirror configuration.

Build docs developers (and LLMs) love