Why Use Chunking?
Memory Efficiency
Process arbitrarily long audio files without running out of memory
Parallelization
Chunks can be processed with better memory locality
Progress Tracking
Monitor progress through long transcription jobs
Reliability
Handle very long recordings that might otherwise fail
Basic Usage
CLI
Python API
Parameters
chunk_duration
Duration of each audio chunk in seconds. Set to
None or 0 to disable chunking and process the entire file at once.Recommended values:- 60-120 seconds for most use cases
- 180-300 seconds for high-memory systems
- 30-60 seconds for low-memory systems
overlap_duration
Overlap between consecutive chunks in seconds. Overlap is used to merge chunk boundaries smoothly and avoid cutting words.Recommended values:
- 10-15 seconds for most use cases
- 20-30 seconds for dense speech
- 5-10 seconds for sparse speech
How Chunking Works
Split audio into chunks
The audio file is split into overlapping chunks based on
chunk_duration and overlap_duration.Transcribe each chunk
Each chunk is transcribed independently with accurate timestamps relative to the chunk start.
Merge overlapping regions
Parakeet MLX uses two strategies to merge overlapping regions:
- Longest Contiguous Subsequence (default): Finds the longest matching sequence of tokens between overlapping regions
- Longest Common Subsequence (fallback): Uses dynamic programming to find the best alignment
Progress Tracking
CLI Progress Bar
The CLI automatically shows a progress bar when processing chunks:Python Callback
Use a callback function to track progress:Detailed Progress Tracking
Merge Strategies
Parakeet MLX uses two merge strategies to handle overlapping regions:Longest Contiguous Subsequence (Primary)
Finds the longest continuous matching sequence between overlapping regions:- ✅ Fast and efficient
- ✅ Works well when overlap matches closely
- ❌ May fail if overlap regions differ significantly
Longest Common Subsequence (Fallback)
Uses dynamic programming to find the best alignment:- ✅ More robust to differences
- ✅ Handles insertions/deletions
- ⚠️ Slightly slower
The merge strategy is automatically selected - longest contiguous is tried first, with longest common subsequence as a fallback if the first strategy fails.
Optimizing Chunk Parameters
For Maximum Accuracy
- ✅ Better context for the model
- ✅ Better overlap merging
- ❌ Higher memory usage
- ❌ Slower processing
For Maximum Speed
- ✅ Lower memory usage
- ✅ Faster processing
- ❌ Less context for model
- ❌ Potential boundary issues
For Memory-Constrained Systems
Recommended Configurations
- Balanced (Default)
- High Accuracy
- Low Memory
- Fast Processing
When to Disable Chunking
Disable chunking (chunk_duration=None or chunk_duration=0) when:
Short Audio
Audio files under 2-3 minutes can typically be processed without chunking
Maximum Accuracy
Full audio context may improve accuracy for some use cases
Sufficient Memory
High-memory systems can process longer audio without chunking
Batch Processing
When using custom batching strategies
CLI Examples
Chunking vs. Streaming
| Feature | Chunking | Streaming |
|---|---|---|
| Purpose | Process long files | Real-time transcription |
| Context | Full chunk context | Limited by window |
| Memory | Per-chunk peak | Bounded by cache |
| Latency | High (batch) | Low (real-time) |
| Accuracy | Higher | Slightly lower |
| Use Case | Batch processing | Live captioning |
| Implementation | transcribe() with chunks | transcribe_stream() |
Troubleshooting
Out of memory errors
Out of memory errors
Solutions:
- Reduce
chunk_duration:--chunk-duration 60 - Use BFloat16:
--bf16 - Reduce overlap:
--overlap-duration 10 - Use local attention:
--local-attention
Words cut off at boundaries
Words cut off at boundaries
Solutions:
- Increase
overlap_duration:--overlap-duration 20 - Use longer chunks:
--chunk-duration 180 - The overlap merging should handle this automatically, but more overlap helps
Slow processing
Slow processing
Solutions:
- Reduce
chunk_durationfor better memory locality - Reduce
overlap_durationfor less merging work - Use greedy decoding:
--decoding greedy - Use BFloat16:
--bf16
Inconsistent timestamps
Inconsistent timestamps
This shouldn’t happen as timestamps are automatically adjusted. If you see issues:
- Check that audio file isn’t corrupted
- Try different overlap settings
- Report as a bug if persistent
Implementation Details
From the source code (parakeet_mlx/parakeet.py:166-221):
Next Steps
Python API
Learn more about the transcribe() method
Streaming
Real-time transcription alternative
Output Formats
Export transcriptions in different formats
Local Attention
Optimize memory usage for long audio