Overview
BaseParakeet is the abstract base class that defines the common interface for all Parakeet model variants. It provides three core methods for transcription:
transcribe()- Transcribe audio filestranscribe_stream()- Real-time streaming transcriptiongenerate()- Low-level mel-spectrogram to text
ParakeetTDT, ParakeetRNNT, ParakeetCTC, ParakeetTDTCTC) inherit from this class.
Class Definition
Properties
time_ratio
Methods
transcribe()
Transcribe an audio file with optional chunking for long files.Parameters
Path to the audio file. Supports WAV, MP3, FLAC, and other formats supported by
audiofile.Data type for audio processing. Should match the model’s dtype.
Configuration for decoding behavior and sentence splitting. See DecodingConfig.
If provided, splits audio into chunks of this duration (in seconds). When
None, processes the entire file at once.Use chunking for:- Very long audio files (> 5 minutes)
- Memory-constrained environments
- Processing audio that exceeds available RAM
Overlap between consecutive chunks in seconds. Only used when
chunk_duration is specified.Higher overlap improves accuracy at chunk boundaries but increases computation time.Callback function called after processing each chunk. Receives
(current_position, total_length) in samples.Useful for progress tracking:Returns
Transcription result with aligned tokens and sentences. See AlignedResult.
Examples
Basic transcription:transcribe_stream()
Create a streaming context for real-time transcription.Parameters
A pair
(left_context, right_context) specifying attention context windows in encoder frames.- left_context: How many past frames to attend to
- right_context: How many future frames to attend to (lookahead)
Number of encoder layers that preserve exact computation across chunks.
depth=1(default): Only first layer’s cache matches exactlydepth=2: First two layers match exactlydepth=N: All N layers match (full equivalence to non-streaming)
Whether to preserve the original attention mechanism.
False(default): Switches to local attention for streamingTrue: Keeps original attention (less suitable for streaming)
Configuration for decoding behavior and sentence splitting.
Returns
A context manager for streaming inference. Use with Python’s
with statement.Examples
Basic streaming:generate()
Generate transcription from mel-spectrogram input. This is the low-level interface used bytranscribe().
Parameters
Mel-spectrogram input with shape:
[batch, sequence, mel_dim]for batch processing, or[sequence, mel_dim]for single input
Configuration object controlling decoding behavior and sentence splitting.
Returns
List of transcription results with aligned tokens and sentences, one for each input in the batch.
Examples
Single input:Configuration Properties
These properties provide access to model configuration:- Getting sample rate:
model.preprocessor_config.sample_rate - Getting hop length:
model.preprocessor_config.hop_length - Getting subsampling factor:
model.encoder_config.subsampling_factor
Related
- from_pretrained - Loading models
- ParakeetTDT - TDT-specific methods
- ParakeetRNNT - RNNT-specific methods
- ParakeetCTC - CTC-specific methods
- DecodingConfig - Decoding configuration
- AlignedResult - Result structure