Speech to Speech ships a singleDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/huggingface/speech-to-speech/llms.txt
Use this file to discover all available pages before exploring further.
speech-to-speech CLI command that starts the full VAD→STT→LLM→TTS pipeline and, by default, exposes an OpenAI Realtime-compatible WebSocket server on port 8765. The steps below walk from a fresh install to a working voice agent in three steps, then show how to connect a client and explore alternative configurations.
On Linux, if your CUDA version is not 12.8, pre-install the matching
qwentts-cpp-python wheel first — see the Installation guide for the exact commands.The default pipeline routes LLM inference through the OpenAI Responses API. Export your key before launching:
You can also pass it explicitly with
--responses_api_api_key if you prefer not to set an environment variable.What the Default Command Does
The barespeech-to-speech command is equivalent to the following fully-expanded invocation. Every flag shown here is a default; you can override any of them:
The server binds to port 8765 by default and exposes the endpoint at
/v1/realtime. Connect any OpenAI Realtime-compatible client to ws://localhost:8765/v1/realtime. Override the port with --ws_port and the bind address with --ws_host.Alternative Quickstarts
Connect with the OpenAI Realtime Client
Once the server is running in--mode realtime (the default), connect to it from Python using the official openai package. Because Speech to Speech implements the OpenAI Realtime protocol, no special client code is needed — just point base_url at your local server:
The
api_key value passed to OpenAI() is not validated by the Speech to Speech server — any non-empty string works. The actual LLM API key is configured server-side via OPENAI_API_KEY or --responses_api_api_key.Mac Optimal Settings Shortcut
On Apple Silicon, a single flag sets Parakeet TDT for STT, MLX LM for language model inference, Qwen3-TTS via mlx-audio for TTS, and--device mps for all models. No API key is required:
Next Steps
- Explore all CLI flags with
speech-to-speech -hor browse the arguments classes in the source. - Swap in a self-hosted LLM server by passing
--responses_api_base_url http://localhost:8000/v1with vLLM or llama.cpp. - Use
--language autowith--enable_lang_promptfor automatic multilingual conversation (English, French, Spanish, Chinese, Japanese, Korean). - Run a pool of parallel pipelines with
--num_pipelines N(requires--mode realtime) to serve multiple concurrent WebSocket sessions.