Installation

This guide will walk you through installing VibeVoice and its dependencies for real-time text-to-speech generation.

Prerequisites

VibeVoice requires Python 3.9 or higher and is optimized for NVIDIA GPUs with CUDA support. It also supports Apple Silicon (MPS) and CPU inference.

System Requirements

Python: 3.9 or higher
Operating System: Linux, macOS, or Windows
Hardware:
- Recommended: NVIDIA GPU (T4 or better)
- Supported: Apple Silicon (M4 Pro or better), CPU
CUDA: For GPU acceleration (optional but recommended)

Installation Methods

Choose Your Environment

Select the appropriate installation method for your system.

Option A: Using NVIDIA Deep Learning Container (Recommended)

For GPU users, we recommend using NVIDIA Deep Learning Container to manage the CUDA environment:

# NVIDIA PyTorch Container 24.07 / 24.10 / 24.12 verified
# Later versions are also compatible
sudo docker run --privileged --net=host --ipc=host \
  --ulimit memlock=-1:-1 --ulimit stack=-1:-1 \
  --gpus all --rm -it \
  nvcr.io/nvidia/pytorch:24.07-py3

If flash attention is not included in your docker environment, install it manually:

pip install flash-attn --no-build-isolation

See flash-attention for more details.

Option B: Local Environment

For local installations without Docker, ensure you have:

Python 3.9+
PyTorch with appropriate CUDA support (if using GPU)
pip package manager

Clone the Repository

Clone the VibeVoice repository from GitHub:

git clone https://github.com/microsoft/VibeVoice.git
cd VibeVoice/

Install Dependencies

Install VibeVoice and all required dependencies:

pip install -e .

This will install the following key dependencies:

torch - PyTorch deep learning framework
transformers==4.51.3 - Hugging Face Transformers (specific version required)
accelerate==1.6.0 - Model acceleration utilities
diffusers - Diffusion model components
gradio - Web UI components
librosa, scipy, numpy - Audio processing
fastapi, uvicorn - Web server for real-time demos

VibeVoice is developed with transformers==4.51.3. Later versions may not be compatible.

Verify Installation

Verify that VibeVoice is installed correctly:

import vibevoice
from vibevoice import (
    VibeVoiceStreamingForConditionalGenerationInference,
    VibeVoiceStreamingProcessor
)

print("VibeVoice installed successfully!")

Device-Specific Configuration

NVIDIA GPU (CUDA)
Apple Silicon (MPS)
CPU

For optimal performance with NVIDIA GPUs:

# Install flash-attention for better performance
pip install flash-attn --no-build-isolation

The model will automatically use:

torch.bfloat16 precision
flash_attention_2 implementation
CUDA device mapping

NVIDIA T4 GPUs and better achieve real-time performance (~300ms first chunk latency).

For Apple Silicon Macs (M1, M2, M3, M4):The model will automatically configure:

torch.float32 precision (required for MPS)
SDPA attention implementation
MPS device mapping

Mac M4 Pro and better achieve real-time performance in testing. Earlier chips may require optimization.

For CPU-only systems:The model will use:

torch.float32 precision
SDPA attention implementation
CPU device mapping

CPU inference is significantly slower than GPU/MPS and may not achieve real-time performance.

Download Models

VibeVoice models are hosted on Hugging Face and will be automatically downloaded when you run inference:

from vibevoice import VibeVoiceStreamingProcessor

# Model will be downloaded automatically
processor = VibeVoiceStreamingProcessor.from_pretrained(
    "microsoft/VibeVoice-Realtime-0.5B"
)

VibeVoice-Realtime-0.5B

Real-time TTS model (0.5B parameters)

Model Collection

Browse all VibeVoice models

Troubleshooting

Flash Attention Installation

If you encounter errors with flash attention:

Ensure you have a compatible CUDA version

Try installing without build isolation:

pip install flash-attn --no-build-isolation

If flash attention fails, the model will fall back to SDPA (may reduce quality)

Transformers Version

If you experience compatibility issues:

# Ensure exact version is installed
pip install transformers==4.51.3 --force-reinstall

MPS Device Issues

On macOS, if MPS is not detected:

import torch
print(f"MPS available: {torch.backends.mps.is_available()}")
print(f"MPS built: {torch.backends.mps.is_built()}")

Ensure you have PyTorch with MPS support installed.

Next Steps

Quickstart Tutorial

Generate your first speech with VibeVoice

Get Started

Models

Guides

Architecture

Resources

Installation

Installation

Prerequisites

System Requirements

Installation Methods

Option A: Using NVIDIA Deep Learning Container (Recommended)

Option B: Local Environment

Device-Specific Configuration

Download Models

VibeVoice-Realtime-0.5B

Model Collection

Troubleshooting

Flash Attention Installation

Transformers Version

MPS Device Issues

Next Steps

Quickstart Tutorial

Build docs developers (and LLMs) love

Get Started

Models

Guides

Architecture

Resources

​Installation

​Prerequisites

​System Requirements

​Installation Methods

​Option A: Using NVIDIA Deep Learning Container (Recommended)

​Option B: Local Environment

​Device-Specific Configuration

​Download Models

VibeVoice-Realtime-0.5B

Model Collection

​Troubleshooting

​Flash Attention Installation

​Transformers Version

​MPS Device Issues

​Next Steps

Quickstart Tutorial

Build docs developers (and LLMs) love

Installation

Prerequisites

System Requirements

Installation Methods

Option A: Using NVIDIA Deep Learning Container (Recommended)

Option B: Local Environment

Device-Specific Configuration

Download Models

Troubleshooting

Flash Attention Installation

Transformers Version

MPS Device Issues

Next Steps