Quickstart

Let’s transcribe an audio file in just a few steps. This guide assumes you’ve already installed Parakeet MLX.

Quick Start

CLI
Python

Transcribe an audio file with a single command:

parakeet-mlx audio.mp3

This creates audio.srt in the current directory with timestamped transcription.

By default, the CLI uses the mlx-community/parakeet-tdt-0.6b-v3 model and generates SRT subtitle format.

Transcribe programmatically with just a few lines:

from parakeet_mlx import from_pretrained

# Load the model
model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")

# Transcribe
result = model.transcribe("audio.mp3")

# Print the transcription
print(result.text)

The first run downloads the model (~600MB) and caches it locally. Subsequent runs are much faster.

Common Use Cases

Batch Processing Multiple Files

parakeet-mlx *.mp3 --output-format vtt

Long Audio with Chunking

For audio longer than a few minutes, enable chunking to manage memory:

parakeet-mlx long_podcast.mp3 --chunk-duration 120 --overlap-duration 15

The default chunk duration is 120 seconds with 15 seconds of overlap. This works well for most speech content.

Real-Time Streaming

For live audio transcription:

from parakeet_mlx import from_pretrained
from parakeet_mlx.audio import load_audio

model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")

# Create streaming context
with model.transcribe_stream(context_size=(256, 256)) as transcriber:
    # Simulate real-time audio chunks
    audio_data = load_audio("audio.mp3", model.preprocessor_config.sample_rate)
    chunk_size = model.preprocessor_config.sample_rate  # 1 second chunks
    
    for i in range(0, len(audio_data), chunk_size):
        chunk = audio_data[i:i+chunk_size]
        transcriber.add_audio(chunk)
        
        # Get current transcription
        print(f"Current: {transcriber.result.text}")

Beam Search for Higher Accuracy

Trade speed for accuracy with beam search:

parakeet-mlx audio.mp3 --decoding beam --beam-size 5

Custom Sentence Splitting

Control how text is split into sentences for subtitles:

from parakeet_mlx import from_pretrained, DecodingConfig, SentenceConfig

model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")

config = DecodingConfig(
    sentence=SentenceConfig(
        max_words=30,          # Max 30 words per subtitle
        silence_gap=5.0,       # Split on 5+ second silence
        max_duration=40.0      # Max 40 second duration
    )
)

result = model.transcribe("audio.mp3", decoding_config=config)

# Each sentence now follows these constraints
for sentence in result.sentences:
    print(f"[{sentence.start:.2f}s - {sentence.end:.2f}s] {sentence.text}")

Performance Tips

Use bfloat16 precision (default)

BFloat16 is 2x faster than FP32 with minimal accuracy loss on Apple Silicon:

from mlx.core import bfloat16

model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3", dtype=bfloat16)

Use local attention for long audio

Reduce memory usage for very long audio files:

model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")
model.encoder.set_attention_model("rel_pos_local_attn", (256, 256))

result = model.transcribe("very_long_audio.mp3")

Choose the right model

TDT models: Best accuracy, beam search support (recommended)
RNNT models: Good balance of speed and accuracy
CTC models: Fastest, simpler architecture

Reuse model instances

Load the model once and reuse it for multiple transcriptions:

model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")

for audio_file in audio_files:
    result = model.transcribe(audio_file)
    # Process result...

Advanced Examples

Low-Level API with Mel Spectrograms

For custom preprocessing pipelines:

from parakeet_mlx import from_pretrained, DecodingConfig
from parakeet_mlx.audio import load_audio, get_logmel

model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")

# Load and preprocess audio manually
audio = load_audio("audio.mp3", model.preprocessor_config.sample_rate)
mel = get_logmel(audio, model.preprocessor_config)

# Generate transcription
alignments = model.generate(mel, decoding_config=DecodingConfig())

# alignments is a list of AlignedResult
for result in alignments:
    print(result.text)

Streaming with Custom Context Size

Fine-tune streaming performance:

from parakeet_mlx import from_pretrained

model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")

with model.transcribe_stream(
    context_size=(512, 512),  # Larger context = better accuracy, more memory
    depth=2                   # More layers preserve computation accuracy
) as transcriber:
    # Process audio chunks...
    pass

Troubleshooting

Command not found: parakeet-mlx

If installed with uv tool, ensure uv’s bin directory is in your PATH:

export PATH="$HOME/.local/bin:$PATH"

Or reinstall with pip:

pip install parakeet-mlx -U

Import error: No module named 'parakeet_mlx'

Verify installation in your active Python environment:

pip show parakeet-mlx
python -c "import parakeet_mlx; print(parakeet_mlx.__file__)"

FFmpeg not found

Install FFmpeg for audio file support:

# macOS
brew install ffmpeg

# Linux
sudo apt install ffmpeg

Out of memory errors

Try these solutions:

Enable chunking: --chunk-duration 120
Use local attention: --local-attention
Close other applications to free up RAM
Choose a smaller model variant

First run is very slow

The first transcription downloads the model (~600MB) from Hugging Face and caches it locally. Subsequent runs will be much faster.You can pre-download models:

from parakeet_mlx import from_pretrained
model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")

Next Steps

Python API Guide

Learn about advanced Python API features

CLI Usage

Explore all CLI options and workflows

Streaming

Set up real-time audio transcription

Output Formats

Learn about SRT, VTT, JSON, and custom formats

Get Started

Core Concepts

Guides

Advanced

Quick Start

Common Use Cases

Batch Processing Multiple Files

Long Audio with Chunking

Real-Time Streaming

Beam Search for Higher Accuracy

Custom Sentence Splitting

Performance Tips

Advanced Examples

Low-Level API with Mel Spectrograms

Streaming with Custom Context Size

Troubleshooting

Next Steps

Python API Guide

CLI Usage

Streaming

Output Formats

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Advanced

Documentation Index

​Quick Start

​Common Use Cases

​Batch Processing Multiple Files

​Long Audio with Chunking

​Real-Time Streaming

​Beam Search for Higher Accuracy

​Custom Sentence Splitting

​Performance Tips

​Advanced Examples

​Low-Level API with Mel Spectrograms

​Streaming with Custom Context Size

​Troubleshooting

​Next Steps

Python API Guide

CLI Usage

Streaming

Output Formats

Build docs developers (and LLMs) love

Quick Start

Common Use Cases

Batch Processing Multiple Files

Long Audio with Chunking

Real-Time Streaming

Beam Search for Higher Accuracy

Custom Sentence Splitting

Performance Tips

Advanced Examples

Low-Level API with Mel Spectrograms

Streaming with Custom Context Size

Troubleshooting

Next Steps