WebSocket Demo

The WebSocket demo provides a web-based interface for real-time streaming text-to-speech synthesis with VibeVoice.

Quick Start

Launch the Server

Start the WebSocket server using the demo launcher:

python demo/vibevoice_realtime_demo.py \
  --port 3000 \
  --model_path microsoft/VibeVoice-Realtime-0.5B \
  --device cuda

Open the Web Interface

Navigate to the demo in your browser:

http://localhost:3000

Generate Speech

Select a voice from the dropdown
Enter your text
Adjust generation parameters (optional)
Click generate to hear the result

Server Configuration

Command-Line Arguments

port

integer

default:"3000"

Port number for the web server

model_path

string

default:"default_model"

Path to the HuggingFace model directory or model ID (e.g., microsoft/VibeVoice-Realtime-0.5B)

device

string

default:"cuda"

Device for inference. Options: cuda, mps, cpu

The mpx typo is automatically corrected to mps

reload

boolean

default:"false"

Enable auto-reload for development. Use --reload flag to enable

Launch Examples

python demo/vibevoice_realtime_demo.py \
  --port 3000 \
  --model_path microsoft/VibeVoice-Realtime-0.5B \
  --device cuda

WebSocket API

The demo exposes a WebSocket endpoint for streaming audio generation.

Endpoint

ws://localhost:3000/stream

Query Parameters

text

string

required

The text to synthesize into speech

voice

string

Voice preset name. Available voices are loaded from demo/voices/streaming_model/

cfg

float

default:"1.5"

Classifier-Free Guidance scale. Higher values increase prompt adherence

steps

integer

default:"5"

Number of diffusion inference steps. More steps may improve quality but increase latency

Connection Example

const ws = new WebSocket(
  'ws://localhost:3000/stream?text=Hello%20world&voice=Wayne&cfg=1.5&steps=5'
);

ws.binaryType = 'arraybuffer';

ws.onmessage = (event) => {
  if (typeof event.data === 'string') {
    // JSON log messages
    const message = JSON.parse(event.data);
    console.log(message.event, message.data);
  } else {
    // Binary audio data (PCM16)
    const audioChunk = new Int16Array(event.data);
    // Play or buffer the audio chunk
  }
};

Audio Format

The WebSocket streams audio in the following format:

Sample Rate: 24,000 Hz
Encoding: PCM 16-bit signed integer
Channels: Mono
Byte Order: Little-endian

Playing Audio in Browser

const audioContext = new AudioContext({ sampleRate: 24000 });
const chunks = [];

ws.onmessage = (event) => {
  if (event.data instanceof ArrayBuffer) {
    const pcm16 = new Int16Array(event.data);
    
    // Convert PCM16 to Float32 for Web Audio API
    const float32 = new Float32Array(pcm16.length);
    for (let i = 0; i < pcm16.length; i++) {
      float32[i] = pcm16[i] / 32768.0;
    }
    
    chunks.push(float32);
  }
};

ws.onclose = () => {
  // Concatenate all chunks and play
  const totalLength = chunks.reduce((sum, chunk) => sum + chunk.length, 0);
  const audioData = new Float32Array(totalLength);
  
  let offset = 0;
  chunks.forEach(chunk => {
    audioData.set(chunk, offset);
    offset += chunk.length;
  });
  
  const audioBuffer = audioContext.createBuffer(1, audioData.length, 24000);
  audioBuffer.getChannelData(0).set(audioData);
  
  const source = audioContext.createBufferSource();
  source.buffer = audioBuffer;
  source.connect(audioContext.destination);
  source.start();
};

Log Messages

The WebSocket sends JSON log messages during generation:

Event Types

{
  "type": "log",
  "event": "backend_request_received",
  "data": {
    "text_length": 42,
    "cfg_scale": 1.5,
    "inference_steps": 5,
    "voice": "Wayne"
  },
  "timestamp": "2026-03-03 14:22:15.123"
}

REST Endpoints

Get Available Voices

curl http://localhost:3000/config

Response:

{
  "voices": [
    "en-WHTest_man",
    "Wayne",
    "Speaker01",
    "Speaker02"
  ],
  "default_voice": "en-WHTest_man"
}

Serve Static Files

The root endpoint / serves the web interface HTML:

curl http://localhost:3000/

Concurrency Control

The WebSocket server implements a lock to handle one request at a time:

If a generation is in progress, new connections will receive a backend_busy message and be closed with code 1013.

if lock.locked():
    busy_message = {
        "type": "log",
        "event": "backend_busy",
        "data": {"message": "Please wait for the other requests to complete."},
        "timestamp": get_timestamp(),
    }
    await ws.send_text(json.dumps(busy_message))
    await ws.close(code=1013, reason="Service busy")

For production use with multiple concurrent users, consider deploying multiple instances behind a load balancer.

Environment Variables

The server reads configuration from environment variables:

MODEL_PATH: Model path (set automatically from --model_path)
MODEL_DEVICE: Device for inference (set automatically from --device)
VOICE_PRESET: Default voice preset name (optional)

Troubleshooting

Server Won’t Start

Check if the port is already in use:

lsof -i :3000

Use a different port:

python demo/vibevoice_realtime_demo.py --port 8080

MPS Not Available Warning

If you see:

Warning: MPS not available. Falling back to CPU.

This means your system doesn’t support MPS. Use CUDA or CPU instead:

python demo/vibevoice_realtime_demo.py --device cpu

Voice Directory Not Found

If voice files aren’t loading:

RuntimeError: Voices directory not found: /path/to/demo/voices/streaming_model

Ensure voice .pt files exist in demo/voices/streaming_model/.

Get Started

Models

Guides

Architecture

Resources

Quick Start

Server Configuration

Command-Line Arguments

Launch Examples

WebSocket API

Endpoint

Query Parameters

Connection Example

Audio Format

Playing Audio in Browser

Log Messages

Event Types

REST Endpoints

Get Available Voices

Serve Static Files

Concurrency Control

Environment Variables

Troubleshooting

Server Won’t Start

MPS Not Available Warning

Voice Directory Not Found

Build docs developers (and LLMs) love

Get Started

Models

Guides

Architecture

Resources

​Quick Start

​Server Configuration

​Command-Line Arguments

​Launch Examples

​WebSocket API

​Endpoint

​Query Parameters

​Connection Example

​Audio Format

​Playing Audio in Browser

​Log Messages

​Event Types

​REST Endpoints

​Get Available Voices

​Serve Static Files

​Concurrency Control

​Environment Variables

​Troubleshooting

​Server Won’t Start

​MPS Not Available Warning

​Voice Directory Not Found

Build docs developers (and LLMs) love

Quick Start

Server Configuration

Command-Line Arguments

Launch Examples

WebSocket API

Endpoint

Query Parameters

Connection Example

Audio Format

Playing Audio in Browser

Log Messages

Event Types

REST Endpoints

Get Available Voices

Serve Static Files

Concurrency Control

Environment Variables

Troubleshooting

Server Won’t Start

MPS Not Available Warning

Voice Directory Not Found