Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/facebookresearch/audioseal/llms.txt

Use this file to discover all available pages before exploring further.

Quickstart Guide

This guide will help you get started with AudioSeal quickly. You’ll learn how to watermark audio and detect watermarks using the core API.

Prerequisites

Make sure you have AudioSeal installed. If not, see the installation guide.
pip install audioseal

Basic Workflow

1

Import AudioSeal

Import the necessary modules and initialize PyTorch:
import torch
from audioseal import AudioSeal
2

Load your audio

Load your audio file as a PyTorch tensor with shape (batch, channels, samples):
# Example: Load audio using torchaudio or your preferred method
import torchaudio

wav, sample_rate = torchaudio.load("input.wav")

# Add batch dimension if needed
if wav.dim() == 2:
    wav = wav.unsqueeze(0)  # Now shape is (1, channels, samples)
AudioSeal expects audio with a batch dimension. Always ensure your input has shape (batch, channels, samples).
3

Watermark the audio

Load the generator and create a watermarked version:
# Load the generator model
generator = AudioSeal.load_generator("audioseal_wm_16bits")
generator.eval()

# Generate watermark
watermark = generator.get_watermark(wav)

# Add watermark to original audio
watermarked_audio = wav + watermark
4

Detect the watermark

Load the detector and check for watermarks:
# Load the detector model
detector = AudioSeal.load_detector("audioseal_detector_16bits")
detector.eval()

# Detect watermark (high-level API)
result, message = detector.detect_watermark(watermarked_audio)

print(f"Detection probability: {result}")
print(f"Decoded message: {message}")

Complete Example

Here’s a complete example that puts it all together:
import torch
from audioseal import AudioSeal

# Load the generator model
generator = AudioSeal.load_generator("audioseal_wm_16bits")
generator.eval()

# Load your audio (example with random audio)
# In practice, load your actual audio file
wav = torch.randn(1, 1, 16000 * 5)  # 5 seconds of audio at 16kHz

# Generate and apply watermark
watermark = generator.get_watermark(wav)
watermarked_audio = wav + watermark

# Load the detector model
detector = AudioSeal.load_detector("audioseal_detector_16bits")
detector.eval()

# Detect watermark
result, message = detector.detect_watermark(watermarked_audio)

print(f"Detection probability: {result.item():.4f}")
print(f"Message bits: {message}")
Expected Output:
Detection probability: 0.9876
Message bits: tensor([[0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0]])
The detection probability should be close to 1.0 for watermarked audio and close to 0.0 for non-watermarked audio.

Using Secret Messages

You can embed a custom 16-bit message in the watermark to identify the source or version:
import torch
from audioseal import AudioSeal

# Create a custom 16-bit message
# Each bit is either 0 or 1
message = torch.tensor([[0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0]])

# Load generator
generator = AudioSeal.load_generator("audioseal_wm_16bits")
generator.eval()

# Generate watermark with custom message
wav = torch.randn(1, 1, 16000 * 5)
watermark = generator.get_watermark(wav, message=message)
watermarked_audio = wav + watermark

# Detect and decode message
detector = AudioSeal.load_detector("audioseal_detector_16bits")
detector.eval()

detection_prob, decoded_message = detector.detect_watermark(watermarked_audio)

print(f"Original message:  {message}")
print(f"Decoded message:   {decoded_message}")
print(f"Messages match: {torch.equal(message, decoded_message)}")
Expected Output:
Original message:  tensor([[0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0]])
Decoded message:   tensor([[0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0]])
Messages match: True
The secret message is optional and does not affect detection. It’s recovered only from watermarked audio; non-watermarked audio will return a random message.

Low-Level Detection API

For frame-level detection (advanced use), you can use the low-level API:
import torch
from audioseal import AudioSeal

# Load models
generator = AudioSeal.load_generator("audioseal_wm_16bits")
detector = AudioSeal.load_detector("audioseal_detector_16bits")
generator.eval()
detector.eval()

# Create watermarked audio
wav = torch.randn(1, 1, 16000 * 5)
watermarked_audio = generator(wav, alpha=1.0)

# Low-level detection
result, message = detector(watermarked_audio)

# result shape: (batch, 2, frames)
# result[:, 0, :] = probability of NO watermark
# result[:, 1, :] = probability of watermark

print(f"Result shape: {result.shape}")
print(f"Watermark probability per frame: {result[:, 1, :]}")
print(f"Message probability: {message}")
Expected Output:
Result shape: torch.Size([1, 2, 625])
Watermark probability per frame: tensor([[[0.98, 0.97, 0.99, ...]]])
Message probability: tensor([[0.45, 0.87, 0.23, ...]])  # probability of each bit being 1

Adjusting Watermark Strength

You can control the watermark strength using the alpha parameter:
# Stronger watermark (more robust, potentially more audible)
watermarked_strong = generator(wav, alpha=1.5)

# Weaker watermark (less audible, less robust)
watermarked_weak = generator(wav, alpha=0.5)

# Default strength
watermarked_default = generator(wav, alpha=1.0)
alpha=1.0 provides a good balance between imperceptibility and robustness. Adjust based on your specific requirements.

Working with Different Sample Rates

AudioSeal works with multiple sample rates. The model handles 16 kHz, 24 kHz, 44.1 kHz, and 48 kHz:
import torch
from audioseal import AudioSeal

# Load models
generator = AudioSeal.load_generator("audioseal_wm_16bits")
detector = AudioSeal.load_detector("audioseal_detector_16bits")

# Example with 48kHz audio
wav_48k = torch.randn(1, 1, 48000 * 5)  # 5 seconds at 48kHz
watermarked = generator(wav_48k)
result, message = detector.detect_watermark(watermarked)

print(f"Detection at 48kHz: {result.item():.4f}")
Starting from AudioSeal 0.2+, audio is not resampled internally. Ensure your audio is at an appropriate sample rate (16/24/44.1/48 kHz) before processing.

Using GPU Acceleration

For faster processing, use GPU if available:
import torch
from audioseal import AudioSeal

# Check for GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Load models on GPU
generator = AudioSeal.load_generator("audioseal_wm_16bits", device=device)
detector = AudioSeal.load_detector("audioseal_detector_16bits", device=device)

# Move audio to GPU
wav = torch.randn(1, 1, 16000 * 5).to(device)

# Process on GPU
watermarked = generator(wav)
result, message = detector.detect_watermark(watermarked)

print(f"Detection result: {result.item():.4f}")

Saving Watermarked Audio

Save your watermarked audio to a file:
import torch
import torchaudio
from audioseal import AudioSeal

# Load and watermark audio
wav, sr = torchaudio.load("input.wav")
wav = wav.unsqueeze(0)  # Add batch dimension

generator = AudioSeal.load_generator("audioseal_wm_16bits")
watermarked = generator(wav)

# Remove batch dimension for saving
watermarked = watermarked.squeeze(0)

# Save to file
torchaudio.save("output_watermarked.wav", watermarked.cpu(), sr)

print("Watermarked audio saved to output_watermarked.wav")

Common Pitfalls

Missing Batch Dimension: Always ensure your audio has shape (batch, channels, samples). Use wav.unsqueeze(0) if needed.
# Wrong - missing batch dimension
wav = torch.randn(1, 16000)  # Shape: (channels, samples)

# Correct
wav = torch.randn(1, 1, 16000)  # Shape: (batch, channels, samples)
# OR
wav = torch.randn(1, 16000).unsqueeze(0)  # Add batch dimension

Next Steps

Now that you’ve learned the basics, explore more advanced features:
For interactive examples, check out the Colab Notebook with visualizations and audio playback.

Build docs developers (and LLMs) love