Run Qwen3-ASR Inside a Docker Container with GPU Support

The Qwen team publishes a pre-built Docker image — qwenllm/qwen3-asr — that bundles the qwen-asr package, vLLM, FlashAttention 2, and all system dependencies. You only need to install the NVIDIA GPU driver and download your model weights; the container handles everything else.

Prerequisites

Before pulling the image, make sure the following are installed on the host machine:

Docker — Install Docker Engine
NVIDIA Container Toolkit — required so Docker can access your GPUs. Follow the official installation guide.

Verify GPU access is available to Docker before proceeding:

docker run --rm --gpus all nvidia/cuda:12.0-base-ubuntu22.04 nvidia-smi

Setup Flow

Set environment variables

Define your local workspace directory and the host port you want to expose. Replace /path/to/your/workspace with the actual path on your host machine.

LOCAL_WORKDIR=/path/to/your/workspace
HOST_PORT=8000
CONTAINER_PORT=80

Pull the image and start a container

Run the following command to pull qwenllm/qwen3-asr:latest and launch an interactive container:

docker run --gpus all --name qwen3-asr \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -p $HOST_PORT:$CONTAINER_PORT \
    --mount type=bind,source=$LOCAL_WORKDIR,target=/data/shared/Qwen3-ASR \
    --shm-size=4gb \
    -it qwenllm/qwen3-asr:latest

After the image is pulled and the container starts, you are dropped into a bash shell inside the container.

Run your code

Your local workspace is mounted at /data/shared/Qwen3-ASR inside the container. You can run any qwen-asr command or Python script from there. See Running the Gradio Demo below for an example.

Port and Volume Mapping

Mapping	Description
`-p $HOST_PORT:$CONTAINER_PORT`	Maps host port `8000` → container port `80`. Services inside must bind to port `80` to be reachable at `http://<host-ip>:8000`.
`--mount type=bind,source=$LOCAL_WORKDIR,target=/data/shared/Qwen3-ASR`	Mounts your local directory inside the container so scripts, downloaded models, and outputs are shared between host and container.
`--shm-size=4gb`	Increases shared memory, which vLLM and PyTorch require for large batch workloads.

Services inside the container must bind to 0.0.0.0, not 127.0.0.1. If a service binds only to the loopback interface, the port mapping will not forward traffic from the host.

Re-Entering the Container

If you exit the shell, the container is stopped but not removed. Restart it and open a new shell with:

docker start qwen3-asr
docker exec -it qwen3-asr bash

Removing the Container

To delete the container completely (data in the mounted volume is unaffected):

docker rm -f qwen3-asr

Running the Gradio Demo Inside Docker

Once inside the container, launch the Gradio demo bound to port 80 (which is mapped to your chosen host port):

qwen-asr-demo \
  --asr-checkpoint Qwen/Qwen3-ASR-1.7B \
  --backend vllm \
  --cuda-visible-devices 0 \
  --ip 0.0.0.0 --port 80

Then open http://<host-ip>:8000 in your browser (substituting the HOST_PORT you chose).

If you are in Mainland China and experience slow pulls from Docker Hub, configure a registry mirror to accelerate the image download. Refer to your Docker daemon documentation for mirror configuration.

Get Started

Inference

Deployment

Fine-Tuning

Reference

Run Qwen3-ASR Inside a Docker Container with GPU Support

Prerequisites

Setup Flow

Port and Volume Mapping

Re-Entering the Container

Removing the Container

Running the Gradio Demo Inside Docker

Build docs developers (and LLMs) love

Get Started

Inference

Deployment

Fine-Tuning

Reference

Documentation Index

​Prerequisites

​Setup Flow

​Port and Volume Mapping

​Re-Entering the Container

​Removing the Container

​Running the Gradio Demo Inside Docker

Build docs developers (and LLMs) love

Prerequisites

Setup Flow

Port and Volume Mapping

Re-Entering the Container

Removing the Container

Running the Gradio Demo Inside Docker