Installation
Make sure you have
ffmpeg installed on your system first, otherwise the CLI won’t work properly.Basic Usage
The basic syntax for the CLI is:Quick Examples
Core Options
Model Selection
Hugging Face repository of the model to use. Available models can be found in the mlx-community/parakeet collection.
Output Configuration
Directory to save transcription outputs.
Format for output files. Options:
txt, srt, vtt, json, allTemplate for output filenames. Available variables:
{filename}- Original filename without extension{parent}- Parent directory path{date}- Current date in YYYYMMDD format{index}- File index (1-based)
Decoding Options
Decoding Method
Decoding method to use:
greedy or beamGreedy decoding (fast)
Beam decoding (accurate)
Beam Search Parameters
These parameters only apply when using--decoding beam:
Number of beams to maintain during search. Higher values increase accuracy but reduce speed.
Length penalty for beam search. Set to 0.0 to disable. Higher values favor longer hypotheses.
Patience multiplier for beam search. Set to 1.0 to disable. Higher values keep more candidates.
TDT-specific: Balance between token and duration logprobs (0.0-1.0).
- < 0.5: Favor token probabilities
-
0.5: Favor duration probabilities
Fine-tuned beam search
Sentence Splitting Options
Control how the transcription is split into sentences:Maximum number of words per sentence.
Split sentences at silence gaps longer than this duration (in seconds).
Maximum sentence duration in seconds.
Limit sentence length
Split on silence
Performance Options
Precision
Choose floating-point precision:
--bf16: BFloat16 precision (default, faster, lower memory)--fp32: Float32 precision (slower, higher memory, potentially more accurate)
Use FP32 precision
Attention Mechanism
Attention mechanism to use:
--full-attention: Standard full attention (default)--local-attention: Local attention (reduces memory for long audio)
Context window size for local attention (in frames).
Use local attention for long audio
Local attention is most useful when transcribing long audio files without chunking.
Cache Directory
Directory for HuggingFace model cache. Defaults to
~/.cache/huggingface or the value of HF_HOME/HF_HUB_CACHE.Custom cache location
Subtitle Features
Word-Level Timestamps
Generate word-level timestamps in SRT/VTT outputs. Each word appears highlighted as it’s spoken.
Verbose Mode
Print detailed progress information including:
- Model loading status
- Output directory and format
- Per-file processing progress
- Sentence-level timestamps and confidence scores
Enable verbose output
Environment Variables
All options can be set via environment variables:| Option | Environment Variable |
|---|---|
--model | PARAKEET_MODEL |
--output-format | PARAKEET_OUTPUT_FORMAT |
--output-template | PARAKEET_OUTPUT_TEMPLATE |
--decoding | PARAKEET_DECODING |
--chunk-duration | PARAKEET_CHUNK_DURATION |
--overlap-duration | PARAKEET_OVERLAP_DURATION |
--beam-size | PARAKEET_BEAM_SIZE |
--length-penalty | PARAKEET_LENGTH_PENALTY |
--patience | PARAKEET_PATIENCE |
--duration-reward | PARAKEET_DURATION_REWARD |
--max-words | PARAKEET_MAX_WORDS |
--silence-gap | PARAKEET_SILENCE_GAP |
--max-duration | PARAKEET_MAX_DURATION |
--fp32 | PARAKEET_FP32 |
--local-attention | PARAKEET_LOCAL_ATTENTION |
--local-attention-context-size | PARAKEET_LOCAL_ATTENTION_CTX |
--cache-dir | PARAKEET_CACHE_DIR |
Example: Set default model
Common Workflows
Batch Processing
transcripts directory.Long Audio Processing
Troubleshooting
FFmpeg not found error
FFmpeg not found error
Install FFmpeg:
Out of memory errors
Out of memory errors
Try these solutions:
- Use BFloat16 precision (default):
--bf16 - Enable chunking:
--chunk-duration 60 - Use local attention:
--local-attention - Reduce beam size:
--beam-size 3
Model download issues
Model download issues
Check your internet connection and HuggingFace access. You can also:
- Pre-download the model using
huggingface-cli - Set a custom cache directory:
--cache-dir /path/to/cache
Next Steps
Chunking Guide
Learn how to efficiently process long audio files
Output Formats
Understand the different output format options
Python API
Use Parakeet MLX programmatically in your code
Streaming
Real-time transcription with streaming inference