Streaming Support

Streaming support is available in AudioSeal 0.2+ and requires Python 3.10 or higher and the einops library.

What is Streaming Mode?

Streaming mode allows you to watermark audio chunks in real-time without processing the entire audio file at once. This is essential for:

Live audio generation: Watermark audio as it’s being generated
Low-latency applications: Start processing before the full audio is available
Memory efficiency: Process large audio files in smaller chunks
Real-time systems: Integrate watermarking into streaming pipelines

Streaming mode uses convolutional caching to maintain consistency across audio chunks, ensuring the watermark remains continuous and detectable.

Requirements

Python Version

Python 3.10 or higher is required for streaming support

python --version  # Should be 3.10+

Install einops

The einops library is required for streaming operations

pip install einops

Use Streaming Models

Load models with streaming support enabled

from audioseal import AudioSeal

model = AudioSeal.load_generator("audioseal_wm_streaming")
detector = AudioSeal.load_detector("audioseal_detector_streaming")

Older checkpoints and Python versions below 3.10 do not support streaming. You’ll get a NotImplementedError if you try to use streaming mode with incompatible models or Python versions.

Using the Streaming Context Manager

The streaming API uses a context manager to handle convolutional caching:

from audioseal import AudioSeal
import torch

# Load streaming model
model = AudioSeal.load_generator("audioseal_wm_streaming")
model.eval()

# Your audio chunks (list of tensors)
audio_chunks = [...]  # Each chunk: [1, channels, samples]

# Create a secret message (optional)
secret_message = torch.randint(0, 2, (1, 16))

# Process chunks in streaming mode
streaming_watermarked_audio = []

with model.streaming(batch_size=1):
    for chunk in audio_chunks:
        # Watermark each chunk
        watermarked_chunk = model(
            chunk, 
            sample_rate=16000,  # Specify sample rate
            message=secret_message,  # Optional: use same message
            alpha=1.0  # Watermark strength
        )
        streaming_watermarked_audio.append(watermarked_chunk)

# Concatenate all watermarked chunks
full_watermarked = torch.cat(streaming_watermarked_audio, dim=-1)

The context manager (with model.streaming()) is critical - it ensures the convolutional cache is properly initialized and cleaned up after processing.

Chunk-Based Processing

When processing audio in chunks, consider these guidelines:

Chunk Size Recommendations

Minimum: 1-2 seconds (16,000 - 32,000 samples at 16kHz)
Optimal: 5-10 seconds for balance of latency and quality
Maximum: No hard limit, but larger chunks reduce the benefit of streaming

import torch

# Example: Split audio into 5-second chunks
sample_rate = 16000
chunk_duration = 5  # seconds
chunk_size = sample_rate * chunk_duration

# Full audio tensor [1, 1, total_samples]
full_audio = torch.randn(1, 1, 160000)  # 10 seconds

# Split into chunks
audio_chunks = []
for i in range(0, full_audio.shape[-1], chunk_size):
    chunk = full_audio[:, :, i:i+chunk_size]
    audio_chunks.append(chunk)

Processing Strategy

# Process chunks one at a time
with model.streaming(batch_size=1):
    for i, chunk in enumerate(audio_chunks):
        watermarked = model(chunk, message=msg, alpha=1.0)
        # Save or stream the watermarked chunk immediately
        save_chunk(watermarked, index=i)

Detecting Streaming Audio

You can detect watermarks in streaming audio using either:

Chunk-by-chunk: Detect each chunk independently
Full audio: Concatenate chunks and detect once

detector = AudioSeal.load_detector("audioseal_detector_streaming")
detector.eval()

# Option 1: Detect individual chunks
for chunk in streaming_watermarked_audio:
    result, message = detector.detect_watermark(chunk)
    print(f"Chunk detection: {result.item():.3f}")

# Option 2: Detect full concatenated audio
full_audio = torch.cat(streaming_watermarked_audio, dim=-1)
full_result, message = detector.detect_watermark(full_audio)
print(f"Full audio detection: {full_result.item():.3f}")

Complete Streaming Example

Here’s a real-world example adapted from the README:

from audioseal import AudioSeal
import torch

# Load streaming models
model = AudioSeal.load_generator("audioseal_wm_streaming")
detector = AudioSeal.load_detector("audioseal_detector_streaming")
model.eval()
detector.eval()

# Configuration
sample_rate = 16000
chunk_duration = 5  # seconds
chunk_size = sample_rate * chunk_duration

# Simulate streaming audio (in practice, this comes from a live source)
full_audio = torch.randn(1, 1, 160000)  # 10 seconds of audio
audio_chunks = [
    full_audio[:, :, i:i+chunk_size] 
    for i in range(0, full_audio.shape[-1], chunk_size)
]

# Create a consistent secret message for all chunks
secret_message = torch.randint(0, 2, (1, 16))

# Watermark in streaming mode
streaming_watermarked = []

with model.streaming(batch_size=1):
    for i, chunk in enumerate(audio_chunks):
        print(f"Processing chunk {i+1}/{len(audio_chunks)}...")
        
        watermarked_chunk = model(
            chunk,
            sample_rate=sample_rate,
            message=secret_message,
            alpha=1.0
        )
        streaming_watermarked.append(watermarked_chunk)

# Concatenate results
full_watermarked = torch.cat(streaming_watermarked, dim=-1)

# Verify watermark detection
detect_prob, decoded_msg = detector.detect_watermark(full_watermarked)

print(f"\nDetection probability: {detect_prob.item():.3f}")
print(f"Original message: {secret_message}")
print(f"Decoded message:  {decoded_msg}")
print(f"Message match: {torch.equal(secret_message, decoded_msg)}")

Key Differences from Batch Processing

Aspect	Batch Mode	Streaming Mode
Processing	Entire audio at once	Chunk-by-chunk
Memory	Requires full audio in memory	Processes small chunks
Latency	High (wait for full audio)	Low (start immediately)
Context	Not needed	Requires `with model.streaming()`
Cache	No cache needed	Uses convolutional cache
Python Version	3.8+	3.10+

Troubleshooting

NotImplementedError

If you get an error about streaming not being supported:

Check Python Version

python --version  # Must be 3.10 or higher

Verify einops Installation

pip install einops

Use Streaming Model

Ensure you’re loading a streaming-compatible model:

model = AudioSeal.load_generator("audioseal_wm_streaming")

Cache Not Cleared

Always use the context manager to ensure cache is properly cleaned:

# ✅ Correct - cache automatically cleaned
with model.streaming(batch_size=1):
    for chunk in chunks:
        watermarked = model(chunk)

# ❌ Incorrect - cache may not be cleaned
model.encoder.streaming(batch_size=1)
for chunk in chunks:
    watermarked = model(chunk)

Get Started

Core Concepts

Guides

Resources

What is Streaming Mode?

Requirements

Using the Streaming Context Manager

Chunk-Based Processing

Chunk Size Recommendations

Processing Strategy

Detecting Streaming Audio

Complete Streaming Example

Key Differences from Batch Processing

Troubleshooting

NotImplementedError

Cache Not Cleared

Next Steps

Secret Messages

Attack Robustness

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Resources

Documentation Index

​What is Streaming Mode?

​Requirements

​Using the Streaming Context Manager

​Chunk-Based Processing

​Chunk Size Recommendations

​Processing Strategy

​Detecting Streaming Audio

​Complete Streaming Example

​Key Differences from Batch Processing

​Troubleshooting

​NotImplementedError

​Cache Not Cleared

​Next Steps

Secret Messages

Attack Robustness

Build docs developers (and LLMs) love

What is Streaming Mode?

Requirements

Using the Streaming Context Manager

Chunk-Based Processing

Chunk Size Recommendations

Processing Strategy

Detecting Streaming Audio

Complete Streaming Example

Key Differences from Batch Processing

Troubleshooting

NotImplementedError

Cache Not Cleared

Next Steps