Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/facebookresearch/audioseal/llms.txt

Use this file to discover all available pages before exploring further.

Streaming support is available in AudioSeal 0.2+ and requires Python 3.10 or higher and the einops library.

What is Streaming Mode?

Streaming mode allows you to watermark audio chunks in real-time without processing the entire audio file at once. This is essential for:
  • Live audio generation: Watermark audio as it’s being generated
  • Low-latency applications: Start processing before the full audio is available
  • Memory efficiency: Process large audio files in smaller chunks
  • Real-time systems: Integrate watermarking into streaming pipelines
Streaming mode uses convolutional caching to maintain consistency across audio chunks, ensuring the watermark remains continuous and detectable.

Requirements

1

Python Version

Python 3.10 or higher is required for streaming support
python --version  # Should be 3.10+
2

Install einops

The einops library is required for streaming operations
pip install einops
3

Use Streaming Models

Load models with streaming support enabled
from audioseal import AudioSeal

model = AudioSeal.load_generator("audioseal_wm_streaming")
detector = AudioSeal.load_detector("audioseal_detector_streaming")
Older checkpoints and Python versions below 3.10 do not support streaming. You’ll get a NotImplementedError if you try to use streaming mode with incompatible models or Python versions.

Using the Streaming Context Manager

The streaming API uses a context manager to handle convolutional caching:
from audioseal import AudioSeal
import torch

# Load streaming model
model = AudioSeal.load_generator("audioseal_wm_streaming")
model.eval()

# Your audio chunks (list of tensors)
audio_chunks = [...]  # Each chunk: [1, channels, samples]

# Create a secret message (optional)
secret_message = torch.randint(0, 2, (1, 16))

# Process chunks in streaming mode
streaming_watermarked_audio = []

with model.streaming(batch_size=1):
    for chunk in audio_chunks:
        # Watermark each chunk
        watermarked_chunk = model(
            chunk, 
            sample_rate=16000,  # Specify sample rate
            message=secret_message,  # Optional: use same message
            alpha=1.0  # Watermark strength
        )
        streaming_watermarked_audio.append(watermarked_chunk)

# Concatenate all watermarked chunks
full_watermarked = torch.cat(streaming_watermarked_audio, dim=-1)
The context manager (with model.streaming()) is critical - it ensures the convolutional cache is properly initialized and cleaned up after processing.

Chunk-Based Processing

When processing audio in chunks, consider these guidelines:

Chunk Size Recommendations

  • Minimum: 1-2 seconds (16,000 - 32,000 samples at 16kHz)
  • Optimal: 5-10 seconds for balance of latency and quality
  • Maximum: No hard limit, but larger chunks reduce the benefit of streaming
import torch

# Example: Split audio into 5-second chunks
sample_rate = 16000
chunk_duration = 5  # seconds
chunk_size = sample_rate * chunk_duration

# Full audio tensor [1, 1, total_samples]
full_audio = torch.randn(1, 1, 160000)  # 10 seconds

# Split into chunks
audio_chunks = []
for i in range(0, full_audio.shape[-1], chunk_size):
    chunk = full_audio[:, :, i:i+chunk_size]
    audio_chunks.append(chunk)

Processing Strategy

# Process chunks one at a time
with model.streaming(batch_size=1):
    for i, chunk in enumerate(audio_chunks):
        watermarked = model(chunk, message=msg, alpha=1.0)
        # Save or stream the watermarked chunk immediately
        save_chunk(watermarked, index=i)

Detecting Streaming Audio

You can detect watermarks in streaming audio using either:
  1. Chunk-by-chunk: Detect each chunk independently
  2. Full audio: Concatenate chunks and detect once
detector = AudioSeal.load_detector("audioseal_detector_streaming")
detector.eval()

# Option 1: Detect individual chunks
for chunk in streaming_watermarked_audio:
    result, message = detector.detect_watermark(chunk)
    print(f"Chunk detection: {result.item():.3f}")

# Option 2: Detect full concatenated audio
full_audio = torch.cat(streaming_watermarked_audio, dim=-1)
full_result, message = detector.detect_watermark(full_audio)
print(f"Full audio detection: {full_result.item():.3f}")

Complete Streaming Example

Here’s a real-world example adapted from the README:
from audioseal import AudioSeal
import torch

# Load streaming models
model = AudioSeal.load_generator("audioseal_wm_streaming")
detector = AudioSeal.load_detector("audioseal_detector_streaming")
model.eval()
detector.eval()

# Configuration
sample_rate = 16000
chunk_duration = 5  # seconds
chunk_size = sample_rate * chunk_duration

# Simulate streaming audio (in practice, this comes from a live source)
full_audio = torch.randn(1, 1, 160000)  # 10 seconds of audio
audio_chunks = [
    full_audio[:, :, i:i+chunk_size] 
    for i in range(0, full_audio.shape[-1], chunk_size)
]

# Create a consistent secret message for all chunks
secret_message = torch.randint(0, 2, (1, 16))

# Watermark in streaming mode
streaming_watermarked = []

with model.streaming(batch_size=1):
    for i, chunk in enumerate(audio_chunks):
        print(f"Processing chunk {i+1}/{len(audio_chunks)}...")
        
        watermarked_chunk = model(
            chunk,
            sample_rate=sample_rate,
            message=secret_message,
            alpha=1.0
        )
        streaming_watermarked.append(watermarked_chunk)

# Concatenate results
full_watermarked = torch.cat(streaming_watermarked, dim=-1)

# Verify watermark detection
detect_prob, decoded_msg = detector.detect_watermark(full_watermarked)

print(f"\nDetection probability: {detect_prob.item():.3f}")
print(f"Original message: {secret_message}")
print(f"Decoded message:  {decoded_msg}")
print(f"Message match: {torch.equal(secret_message, decoded_msg)}")

Key Differences from Batch Processing

AspectBatch ModeStreaming Mode
ProcessingEntire audio at onceChunk-by-chunk
MemoryRequires full audio in memoryProcesses small chunks
LatencyHigh (wait for full audio)Low (start immediately)
ContextNot neededRequires with model.streaming()
CacheNo cache neededUses convolutional cache
Python Version3.8+3.10+

Troubleshooting

NotImplementedError

If you get an error about streaming not being supported:
1

Check Python Version

python --version  # Must be 3.10 or higher
2

Verify einops Installation

pip install einops
3

Use Streaming Model

Ensure you’re loading a streaming-compatible model:
model = AudioSeal.load_generator("audioseal_wm_streaming")

Cache Not Cleared

Always use the context manager to ensure cache is properly cleaned:
# ✅ Correct - cache automatically cleaned
with model.streaming(batch_size=1):
    for chunk in chunks:
        watermarked = model(chunk)

# ❌ Incorrect - cache may not be cleaned
model.encoder.streaming(batch_size=1)
for chunk in chunks:
    watermarked = model(chunk)

Next Steps

Secret Messages

Learn how to embed custom messages in streaming watermarks

Attack Robustness

Understand how streaming watermarks handle audio attacks

Build docs developers (and LLMs) love