CLI Usage

The Parakeet MLX CLI provides a powerful command-line interface for transcribing audio files using Nvidia’s Parakeet ASR models on Apple Silicon.

Installation

Make sure you have ffmpeg installed on your system first, otherwise the CLI won’t work properly.

uv tool install parakeet-mlx -U

Basic Usage

The basic syntax for the CLI is:

parakeet-mlx <audio_files> [OPTIONS]

Quick Examples

parakeet-mlx audio.mp3

Core Options

Model Selection

--model

string

default:"mlx-community/parakeet-tdt-0.6b-v3"

Hugging Face repository of the model to use. Available models can be found in the mlx-community/parakeet collection.

parakeet-mlx audio.mp3 --model mlx-community/parakeet-tdt-1.1b

Set the PARAKEET_MODEL environment variable to avoid specifying the model every time:

export PARAKEET_MODEL=mlx-community/parakeet-tdt-1.1b

Output Configuration

--output-dir

path

default:"."

Directory to save transcription outputs.

--output-format

string

default:"srt"

Format for output files. Options: txt, srt, vtt, json, all

--output-template

string

default:"{filename}"

Template for output filenames. Available variables:

{filename} - Original filename without extension
{parent} - Parent directory path
{date} - Current date in YYYYMMDD format
{index} - File index (1-based)

parakeet-mlx audio.mp3 --output-dir ./transcriptions

Decoding Options

Decoding Method

--decoding

string

default:"greedy"

Decoding method to use: greedy or beam

Beam decoding is only available for TDT models and is slower but potentially more accurate.

Greedy decoding (fast)

parakeet-mlx audio.mp3 --decoding greedy

Beam decoding (accurate)

parakeet-mlx audio.mp3 --decoding beam --beam-size 5

Beam Search Parameters

These parameters only apply when using --decoding beam:

--beam-size

integer

default:"5"

Number of beams to maintain during search. Higher values increase accuracy but reduce speed.

--length-penalty

float

default:"0.013"

Length penalty for beam search. Set to 0.0 to disable. Higher values favor longer hypotheses.

--patience

float

default:"3.5"

Patience multiplier for beam search. Set to 1.0 to disable. Higher values keep more candidates.

--duration-reward

float

default:"0.67"

TDT-specific: Balance between token and duration logprobs (0.0-1.0).

< 0.5: Favor token probabilities
0.5: Favor duration probabilities

Fine-tuned beam search

parakeet-mlx audio.mp3 \
  --decoding beam \
  --beam-size 10 \
  --length-penalty 0.02 \
  --patience 4.0 \
  --duration-reward 0.7

Sentence Splitting Options

Control how the transcription is split into sentences:

--max-words

integer

default:"None"

Maximum number of words per sentence.

--silence-gap

float

default:"None"

Split sentences at silence gaps longer than this duration (in seconds).

--max-duration

float

default:"None"

Maximum sentence duration in seconds.

Limit sentence length

parakeet-mlx audio.mp3 --max-words 20 --max-duration 10.0

Split on silence

parakeet-mlx audio.mp3 --silence-gap 2.0

Performance Options

Precision

--fp32 / --bf16

boolean

default:"bf16"

Choose floating-point precision:

--bf16: BFloat16 precision (default, faster, lower memory)
--fp32: Float32 precision (slower, higher memory, potentially more accurate)

Use FP32 precision

parakeet-mlx audio.mp3 --fp32

Attention Mechanism

--local-attention / --full-attention

boolean

default:"full-attention"

Attention mechanism to use:

--full-attention: Standard full attention (default)
--local-attention: Local attention (reduces memory for long audio)

--local-attention-context-size

integer

default:"256"

Context window size for local attention (in frames).

Use local attention for long audio

parakeet-mlx long_audio.mp3 \
  --local-attention \
  --local-attention-context-size 512 \
  --chunk-duration 0

Local attention is most useful when transcribing long audio files without chunking.

Cache Directory

--cache-dir

path

default:"None"

Directory for HuggingFace model cache. Defaults to ~/.cache/huggingface or the value of HF_HOME/HF_HUB_CACHE.

Custom cache location

parakeet-mlx audio.mp3 --cache-dir /path/to/cache

Subtitle Features

Word-Level Timestamps

--highlight-words

boolean

default:"false"

Generate word-level timestamps in SRT/VTT outputs. Each word appears highlighted as it’s spoken.

parakeet-mlx audio.mp3 --output-format srt

Verbose Mode

--verbose / -v

boolean

default:"false"

Print detailed progress information including:

Model loading status
Output directory and format
Per-file processing progress
Sentence-level timestamps and confidence scores

Enable verbose output

parakeet-mlx audio.mp3 -v

Example verbose output:

Loading model: mlx-community/parakeet-tdt-0.6b-v3...
Model loaded successfully.
Output directory: /current/directory
Output format(s): srt
Transcribing 1 file(s)...

Processing file 1/1: audio.mp3
[00:00:00,000 --> 00:00:02,340] (confidence: 95.32%) Hello world.
[00:00:02,340 --> 00:00:05,120] (confidence: 93.18%) This is a test.

Saved SRT: /current/directory/audio.srt

parakeet-tdt-0.6b-v3 transcription complete. Outputs saved in '/current/directory'.

Environment Variables

All options can be set via environment variables:

Option	Environment Variable
`--model`	`PARAKEET_MODEL`
`--output-format`	`PARAKEET_OUTPUT_FORMAT`
`--output-template`	`PARAKEET_OUTPUT_TEMPLATE`
`--decoding`	`PARAKEET_DECODING`
`--chunk-duration`	`PARAKEET_CHUNK_DURATION`
`--overlap-duration`	`PARAKEET_OVERLAP_DURATION`
`--beam-size`	`PARAKEET_BEAM_SIZE`
`--length-penalty`	`PARAKEET_LENGTH_PENALTY`
`--patience`	`PARAKEET_PATIENCE`
`--duration-reward`	`PARAKEET_DURATION_REWARD`
`--max-words`	`PARAKEET_MAX_WORDS`
`--silence-gap`	`PARAKEET_SILENCE_GAP`
`--max-duration`	`PARAKEET_MAX_DURATION`
`--fp32`	`PARAKEET_FP32`
`--local-attention`	`PARAKEET_LOCAL_ATTENTION`
`--local-attention-context-size`	`PARAKEET_LOCAL_ATTENTION_CTX`
`--cache-dir`	`PARAKEET_CACHE_DIR`

Example: Set default model

export PARAKEET_MODEL=mlx-community/parakeet-tdt-1.1b
export PARAKEET_OUTPUT_FORMAT=vtt
export PARAKEET_DECODING=beam

parakeet-mlx audio.mp3  # Uses environment defaults

Common Workflows

Basic Transcription

parakeet-mlx audio.mp3

Generates audio.srt in the current directory.

Batch Processing

parakeet-mlx *.mp3 --output-dir ./transcripts --output-format all

Transcribes all MP3 files and generates all output formats in the transcripts directory.

High-Quality Subtitles

parakeet-mlx video.mp4 \
  --output-format vtt \
  --highlight-words \
  --decoding beam \
  --beam-size 10 \
  --max-duration 8.0

Generates word-level VTT subtitles with beam search for maximum accuracy.

Long Audio Processing

parakeet-mlx podcast.mp3 \
  --chunk-duration 120 \
  --overlap-duration 15 \
  --output-format json \
  -v

Process long audio with chunking and verbose output. See Chunking Guide for details.

Troubleshooting

FFmpeg not found error

Install FFmpeg:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

Out of memory errors

Try these solutions:

Use BFloat16 precision (default): --bf16
Enable chunking: --chunk-duration 60
Use local attention: --local-attention
Reduce beam size: --beam-size 3

Model download issues

Check your internet connection and HuggingFace access. You can also:

Pre-download the model using huggingface-cli
Set a custom cache directory: --cache-dir /path/to/cache

Next Steps

Chunking Guide

Learn how to efficiently process long audio files

Output Formats

Understand the different output format options

Python API

Use Parakeet MLX programmatically in your code

Streaming

Real-time transcription with streaming inference

Get Started

Core Concepts

Guides

Advanced

Installation

Basic Usage

Quick Examples

Core Options

Model Selection

Output Configuration

Decoding Options

Decoding Method

Beam Search Parameters

Sentence Splitting Options

Performance Options

Precision

Attention Mechanism

Cache Directory

Subtitle Features

Word-Level Timestamps

Verbose Mode

Environment Variables

Common Workflows

Troubleshooting

Next Steps

Chunking Guide

Output Formats

Python API

Streaming

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Advanced

Documentation Index

​Installation

​Basic Usage

​Quick Examples

​Core Options

​Model Selection

​Output Configuration

​Decoding Options

​Decoding Method

​Beam Search Parameters

​Sentence Splitting Options

​Performance Options

​Precision

​Attention Mechanism

​Cache Directory

​Subtitle Features

​Word-Level Timestamps

​Verbose Mode

​Environment Variables

​Common Workflows

​Troubleshooting

​Next Steps

Chunking Guide

Output Formats

Python API

Streaming

Build docs developers (and LLMs) love

Installation

Basic Usage

Quick Examples

Core Options

Model Selection

Output Configuration

Decoding Options

Decoding Method

Beam Search Parameters

Sentence Splitting Options

Performance Options

Precision

Attention Mechanism

Cache Directory

Subtitle Features

Word-Level Timestamps

Verbose Mode

Environment Variables

Common Workflows

Troubleshooting

Next Steps