This creates audio.srt in the current directory with timestamped transcription.
By default, the CLI uses the mlx-community/parakeet-tdt-0.6b-v3 model and generates SRT subtitle format.
Transcribe programmatically with just a few lines:
from parakeet_mlx import from_pretrained# Load the modelmodel = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")# Transcriberesult = model.transcribe("audio.mp3")# Print the transcriptionprint(result.text)
The first run downloads the model (~600MB) and caches it locally. Subsequent runs are much faster.
Control how text is split into sentences for subtitles:
from parakeet_mlx import from_pretrained, DecodingConfig, SentenceConfigmodel = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")config = DecodingConfig( sentence=SentenceConfig( max_words=30, # Max 30 words per subtitle silence_gap=5.0, # Split on 5+ second silence max_duration=40.0 # Max 40 second duration ))result = model.transcribe("audio.mp3", decoding_config=config)# Each sentence now follows these constraintsfor sentence in result.sentences: print(f"[{sentence.start:.2f}s - {sentence.end:.2f}s] {sentence.text}")
from parakeet_mlx import from_pretrained, DecodingConfigfrom parakeet_mlx.audio import load_audio, get_logmelmodel = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")# Load and preprocess audio manuallyaudio = load_audio("audio.mp3", model.preprocessor_config.sample_rate)mel = get_logmel(audio, model.preprocessor_config)# Generate transcriptionalignments = model.generate(mel, decoding_config=DecodingConfig())# alignments is a list of AlignedResultfor result in alignments: print(result.text)
The first transcription downloads the model (~600MB) from Hugging Face and caches it locally. Subsequent runs will be much faster.You can pre-download models:
from parakeet_mlx import from_pretrainedmodel = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")