The repository ships aDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/huggingface/speech-to-speech/llms.txt
Use this file to discover all available pages before exploring further.
docker-compose.yml and a matching Dockerfile that package the entire pipeline into a GPU-enabled container. The container image is built on top of PyTorch’s official CUDA image, exposes the server/client socket ports, and mounts a local directory as a model cache so downloaded weights persist across restarts. This is the recommended path for deploying Speech to Speech on a headless Linux server with an NVIDIA GPU.
Prerequisites
NVIDIA Container Toolkit
The container uses NVIDIA GPU passthrough, so you must install the NVIDIA Container Toolkit on the host before runningdocker compose up:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
Starting the Container
From the repository root:Dockerfile changes) and then starts the pipeline container with GPU device 0 reserved.
To rebuild the image explicitly:
The docker-compose.yml Configuration
Key Configuration Points
| Setting | Value | Description |
|---|---|---|
ports | 12345:12345, 12346:12346 | TCP socket ports for audio receive and send (server/client mode) |
volumes | ./cache/:/root/.cache/ | Persists Hugging Face model weights across container restarts |
volumes | ./s2s_pipeline.py | Mounts the pipeline script so you can edit it without rebuilding |
device_ids | ['0'] | Passes GPU 0 to the container; change to ['1'] for a second GPU |
--stt_compile_mode | reduce-overhead | Enables Torch Compile for Whisper-based STT, reducing per-call overhead |
The Dockerfile
pytorch/pytorch:2.4.0-cuda12.1-cudnn9-devel provides:
- PyTorch 2.4.0 pre-built for CUDA 12.1 / cuDNN 9
- The full CUDA developer toolkit (needed by some model extensions at runtime)
uv in two steps: first the locked dependencies from pyproject.toml, then the project package itself. This layer-caches the heavy dependency install so rebuilds after code changes are fast.
Connecting a Client
Once the container is running, connect from another machine usingscripts/listen_and_play.py:
Customising the Model and Arguments
Edit thecommand list in docker-compose.yml to change the LLM, STT, or TTS:
.env file and reference them with ${VAR} syntax in the compose file.
Model Cache Volume
The./cache/ directory on the host is mounted to /root/.cache/ inside the container, which is where the Hugging Face Hub caches downloaded model weights. Create it before the first run:
ARM64 Support
For NVIDIA Jetson devices and other ARM64 platforms (running L4T / JetPack), useDockerfile.arm64 instead:
Dockerfile.arm64 is based on nvcr.io/nvidia/l4t-pytorch:r35.2.1-pth2.0-py3, which provides PyTorch 2.0 pre-built for the L4T (Linux for Tegra) environment. The rest of the build steps are identical to the standard Dockerfile.