Speech to Speech is published on PyPI asDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/huggingface/speech-to-speech/llms.txt
Use this file to discover all available pages before exploring further.
speech-to-speech and requires Python 3.10 or later. The default install covers the standard realtime voice-agent path: Parakeet TDT for STT, the OpenAI Responses API for the LLM, and Qwen3-TTS for speech output. Optional extras install additional backends without affecting the default configuration.
Default (pip)
The default install bundles all dependencies for the recommended Parakeet TDT + Responses API + Qwen3-TTS pipeline on your platform:On macOS, the MLX stack (
mlx, mlx-audio, mlx-lm, mlx-metal, misaki, spacy, and friends) is pulled in automatically via platform markers in pyproject.toml. On Linux / Windows, the Qwen3-TTS GGML backend (faster-qwen3-tts[ggml]) and Parakeet TDT (nano-parakeet) are installed instead. Linux — CUDA variant
On Linux,
faster-qwen3-tts[ggml] ships a qwentts-cpp-python wheel that targets CUDA 12.8 by default. If your machine runs a different CUDA version, install the matching wheel from the Hugging Face wheelhouse before running pip install speech-to-speech:If you want to use the previous CUDA-graphs (PyTorch) implementation instead of GGML, skip the wheel above and pass
--qwen3_tts_backend torch at runtime. Development / source
Clone the repository and use
uv to install all dependencies in editable mode. This also makes the speech-to-speech CLI available immediately:uv sync reads pyproject.toml and resolves platform-specific dependencies automatically — no separate requirements files are needed. The speech_to_speech package is installed in editable mode so local changes take effect without reinstalling.Optional extras extend the pipeline with alternative backends. Install them alongside the base package using pip extras syntax:
# Kokoro-82M TTS — fast, high-quality synthesis (non-macOS platforms)
pip install "speech-to-speech[kokoro]"
# Pocket TTS from Kyutai Labs — streaming TTS with voice cloning
pip install "speech-to-speech[pocket]"
# ChatTTS — multilingual TTS
pip install "speech-to-speech[chattts]"
# Faster Whisper STT — CTranslate2-based Whisper for accelerated CPU/CUDA transcription
pip install "speech-to-speech[faster-whisper]"
# Paraformer STT — FunASR-based Paraformer for Mandarin and multilingual transcription
pip install "speech-to-speech[paraformer]"
# Lightning Whisper MLX STT — fast Whisper on Apple Silicon (macOS only)
pip install "speech-to-speech[whisper-mlx]"
# MLX LM — explicit MLX LLM backend (already bundled on macOS; use on macOS only)
pip install "speech-to-speech[mlx-lm]"
# WebSocket — explicit websockets dependency for the websocket run mode
pip install "speech-to-speech[websocket]"
[kokoro]kokoro>=0.9.2[pocket]pocket-tts>=0.1.0[chattts]ChatTTS>=0.1.1[faster-whisper]faster-whisper>=1.0.3[paraformer]funasr, modelscope, onnxruntime[whisper-mlx]lightning-whisper-mlx>=0.0.10[mlx-lm]mlx-lm==0.31.1, mlx-vlm (macOS only)[websocket]websockets>=12.0--mode websocketThe default pipeline uses the OpenAI Responses API for the LLM stage. Export your API key before launching:
Platform Notes
- Linux (CUDA)
- macOS (Apple Silicon)
- Development
The recommended Linux setup leverages the GGML backend for Qwen3-TTS and Parakeet TDT for STT. If you are on CUDA 12.8, a plain
pip install speech-to-speech is sufficient. For other CUDA versions, pre-install the matching qwentts-cpp-python wheel as shown in the Linux — CUDA variant tab above before installing the package.To verify GPU availability: