TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/senstella/parakeet-mlx/llms.txt
Use this file to discover all available pages before exploring further.
parakeet_mlx.audio module provides utilities for loading audio files and converting them to log-mel spectrograms for processing by Parakeet models.
Functions
load_audio
Load an audio file and resample it to the target sampling rate.Parameters
Path to the audio file. Supports any format that FFmpeg can read (WAV, MP3, FLAC, etc.).Example:
"audio.wav", Path("/path/to/audio.mp3")Target sampling rate in Hz. Audio will be resampled to this rate.Parakeet models typically use 16000 Hz.Example:
16000MLX data type for the output array.Common options:
mx.bfloat16: Memory efficient (recommended)mx.float32: Higher precision
mx.bfloat16Returns
1D array of audio samples normalized to the range [-1.0, 1.0].Shape:
[num_samples]Requirements
FFmpeg must be installed and available in your PATH. Install it with:Example
get_logmel
Convert audio samples to log-mel spectrogram.Parameters
1D array of audio samples (output from
load_audio).Shape: [num_samples]Preprocessing configuration. Use
model.preprocessor_config to ensure compatibility with your model.This dataclass contains:sample_rate: Audio sample ratefeatures: Number of mel filterbanksn_fft: FFT window sizewindow_size: STFT window size in secondswindow_stride: STFT hop length in secondswindow: Window function (“hann”, “hamming”, “blackman”, “bartlett”)normalize: Normalization strategy (“per_feature” or “global”)preemph: Pre-emphasis coefficient- Other parameters (see PreprocessArgs)
Returns
Log-mel spectrogram ready for model input.Shape:
[1, sequence_length, mel_features]The output includes:- Batch dimension of 1
- Normalized log-mel features
- Proper data type matching input
Processing Steps
get_logmel performs the following operations:
- Pre-emphasis: Apply first-order filter (if enabled)
- STFT: Short-time Fourier transform with specified window
- Magnitude: Compute power spectrum
- Mel filterbank: Apply mel-scale filterbank
- Logarithm: Convert to log scale
- Normalization: Normalize features (per-feature or global)
- Reshape: Add batch dimension
Example
Complete Example
Low-Level API Usage
Batch Processing
Custom Audio Processing
Chunking Long Audio
Best Practice: Use
model.transcribe() instead of manually calling load_audio and get_logmel. The high-level API handles chunking, overlaps, and merging automatically.PreprocessArgs
ThePreprocessArgs dataclass (from parakeet_mlx.audio) contains all audio preprocessing configuration:
Related
- BaseParakeet.transcribe() - High-level API that uses these utilities
- Low-Level API Guide - Learn when to use audio utilities directly