Installation
This guide will walk you through installing VibeVoice and its dependencies for real-time text-to-speech generation.Prerequisites
VibeVoice requires Python 3.9 or higher and is optimized for NVIDIA GPUs with CUDA support. It also supports Apple Silicon (MPS) and CPU inference.
System Requirements
- Python: 3.9 or higher
- Operating System: Linux, macOS, or Windows
- Hardware:
- Recommended: NVIDIA GPU (T4 or better)
- Supported: Apple Silicon (M4 Pro or better), CPU
- CUDA: For GPU acceleration (optional but recommended)
Installation Methods
Choose Your Environment
Select the appropriate installation method for your system.
Option A: Using NVIDIA Deep Learning Container (Recommended)
For GPU users, we recommend using NVIDIA Deep Learning Container to manage the CUDA environment:If flash attention is not included in your docker environment, install it manually:See flash-attention for more details.
Option B: Local Environment
For local installations without Docker, ensure you have:- Python 3.9+
- PyTorch with appropriate CUDA support (if using GPU)
- pip package manager
Install Dependencies
Install VibeVoice and all required dependencies:This will install the following key dependencies:
torch- PyTorch deep learning frameworktransformers==4.51.3- Hugging Face Transformers (specific version required)accelerate==1.6.0- Model acceleration utilitiesdiffusers- Diffusion model componentsgradio- Web UI componentslibrosa,scipy,numpy- Audio processingfastapi,uvicorn- Web server for real-time demos
Device-Specific Configuration
- NVIDIA GPU (CUDA)
- Apple Silicon (MPS)
- CPU
For optimal performance with NVIDIA GPUs:The model will automatically use:
torch.bfloat16precisionflash_attention_2implementation- CUDA device mapping
Download Models
VibeVoice models are hosted on Hugging Face and will be automatically downloaded when you run inference:VibeVoice-Realtime-0.5B
Real-time TTS model (0.5B parameters)
Model Collection
Browse all VibeVoice models
Troubleshooting
Flash Attention Installation
If you encounter errors with flash attention:- Ensure you have a compatible CUDA version
- Try installing without build isolation:
- If flash attention fails, the model will fall back to SDPA (may reduce quality)
Transformers Version
If you experience compatibility issues:MPS Device Issues
On macOS, if MPS is not detected:Next Steps
Quickstart Tutorial
Generate your first speech with VibeVoice