Documentation Index Fetch the complete documentation index at: https://mintlify.com/facebookresearch/audioseal/llms.txt
Use this file to discover all available pages before exploring further.
Streaming support is available in AudioSeal 0.2+ and requires Python 3.10 or higher and the einops library.
What is Streaming Mode?
Streaming mode allows you to watermark audio chunks in real-time without processing the entire audio file at once. This is essential for:
Live audio generation : Watermark audio as it’s being generated
Low-latency applications : Start processing before the full audio is available
Memory efficiency : Process large audio files in smaller chunks
Real-time systems : Integrate watermarking into streaming pipelines
Streaming mode uses convolutional caching to maintain consistency across audio chunks, ensuring the watermark remains continuous and detectable.
Requirements
Python Version
Python 3.10 or higher is required for streaming support python --version # Should be 3.10+
Install einops
The einops library is required for streaming operations
Use Streaming Models
Load models with streaming support enabled from audioseal import AudioSeal
model = AudioSeal.load_generator( "audioseal_wm_streaming" )
detector = AudioSeal.load_detector( "audioseal_detector_streaming" )
Older checkpoints and Python versions below 3.10 do not support streaming. You’ll get a NotImplementedError if you try to use streaming mode with incompatible models or Python versions.
Using the Streaming Context Manager
The streaming API uses a context manager to handle convolutional caching:
from audioseal import AudioSeal
import torch
# Load streaming model
model = AudioSeal.load_generator( "audioseal_wm_streaming" )
model.eval()
# Your audio chunks (list of tensors)
audio_chunks = [ ... ] # Each chunk: [1, channels, samples]
# Create a secret message (optional)
secret_message = torch.randint( 0 , 2 , ( 1 , 16 ))
# Process chunks in streaming mode
streaming_watermarked_audio = []
with model.streaming( batch_size = 1 ):
for chunk in audio_chunks:
# Watermark each chunk
watermarked_chunk = model(
chunk,
sample_rate = 16000 , # Specify sample rate
message = secret_message, # Optional: use same message
alpha = 1.0 # Watermark strength
)
streaming_watermarked_audio.append(watermarked_chunk)
# Concatenate all watermarked chunks
full_watermarked = torch.cat(streaming_watermarked_audio, dim =- 1 )
The context manager (with model.streaming()) is critical - it ensures the convolutional cache is properly initialized and cleaned up after processing.
Chunk-Based Processing
When processing audio in chunks, consider these guidelines:
Chunk Size Recommendations
Minimum : 1-2 seconds (16,000 - 32,000 samples at 16kHz)
Optimal : 5-10 seconds for balance of latency and quality
Maximum : No hard limit, but larger chunks reduce the benefit of streaming
import torch
# Example: Split audio into 5-second chunks
sample_rate = 16000
chunk_duration = 5 # seconds
chunk_size = sample_rate * chunk_duration
# Full audio tensor [1, 1, total_samples]
full_audio = torch.randn( 1 , 1 , 160000 ) # 10 seconds
# Split into chunks
audio_chunks = []
for i in range ( 0 , full_audio.shape[ - 1 ], chunk_size):
chunk = full_audio[:, :, i:i + chunk_size]
audio_chunks.append(chunk)
Processing Strategy
Sequential Processing
Batch Processing
# Process chunks one at a time
with model.streaming( batch_size = 1 ):
for i, chunk in enumerate (audio_chunks):
watermarked = model(chunk, message = msg, alpha = 1.0 )
# Save or stream the watermarked chunk immediately
save_chunk(watermarked, index = i)
Detecting Streaming Audio
You can detect watermarks in streaming audio using either:
Chunk-by-chunk : Detect each chunk independently
Full audio : Concatenate chunks and detect once
detector = AudioSeal.load_detector( "audioseal_detector_streaming" )
detector.eval()
# Option 1: Detect individual chunks
for chunk in streaming_watermarked_audio:
result, message = detector.detect_watermark(chunk)
print ( f "Chunk detection: { result.item() :.3f} " )
# Option 2: Detect full concatenated audio
full_audio = torch.cat(streaming_watermarked_audio, dim =- 1 )
full_result, message = detector.detect_watermark(full_audio)
print ( f "Full audio detection: { full_result.item() :.3f} " )
Complete Streaming Example
Here’s a real-world example adapted from the README:
from audioseal import AudioSeal
import torch
# Load streaming models
model = AudioSeal.load_generator( "audioseal_wm_streaming" )
detector = AudioSeal.load_detector( "audioseal_detector_streaming" )
model.eval()
detector.eval()
# Configuration
sample_rate = 16000
chunk_duration = 5 # seconds
chunk_size = sample_rate * chunk_duration
# Simulate streaming audio (in practice, this comes from a live source)
full_audio = torch.randn( 1 , 1 , 160000 ) # 10 seconds of audio
audio_chunks = [
full_audio[:, :, i:i + chunk_size]
for i in range ( 0 , full_audio.shape[ - 1 ], chunk_size)
]
# Create a consistent secret message for all chunks
secret_message = torch.randint( 0 , 2 , ( 1 , 16 ))
# Watermark in streaming mode
streaming_watermarked = []
with model.streaming( batch_size = 1 ):
for i, chunk in enumerate (audio_chunks):
print ( f "Processing chunk { i + 1 } / { len (audio_chunks) } ..." )
watermarked_chunk = model(
chunk,
sample_rate = sample_rate,
message = secret_message,
alpha = 1.0
)
streaming_watermarked.append(watermarked_chunk)
# Concatenate results
full_watermarked = torch.cat(streaming_watermarked, dim =- 1 )
# Verify watermark detection
detect_prob, decoded_msg = detector.detect_watermark(full_watermarked)
print ( f " \n Detection probability: { detect_prob.item() :.3f} " )
print ( f "Original message: { secret_message } " )
print ( f "Decoded message: { decoded_msg } " )
print ( f "Message match: { torch.equal(secret_message, decoded_msg) } " )
Key Differences from Batch Processing
Aspect Batch Mode Streaming Mode Processing Entire audio at once Chunk-by-chunk Memory Requires full audio in memory Processes small chunks Latency High (wait for full audio) Low (start immediately) Context Not needed Requires with model.streaming() Cache No cache needed Uses convolutional cache Python Version 3.8+ 3.10+
Troubleshooting
NotImplementedError
If you get an error about streaming not being supported:
Check Python Version
python --version # Must be 3.10 or higher
Verify einops Installation
Use Streaming Model
Ensure you’re loading a streaming-compatible model: model = AudioSeal.load_generator( "audioseal_wm_streaming" )
Cache Not Cleared
Always use the context manager to ensure cache is properly cleaned:
# ✅ Correct - cache automatically cleaned
with model.streaming( batch_size = 1 ):
for chunk in chunks:
watermarked = model(chunk)
# ❌ Incorrect - cache may not be cleaned
model.encoder.streaming( batch_size = 1 )
for chunk in chunks:
watermarked = model(chunk)
Next Steps
Secret Messages Learn how to embed custom messages in streaming watermarks
Attack Robustness Understand how streaming watermarks handle audio attacks