SentenceConfig

SentenceConfig is a dataclass that controls how transcribed text is split into sentences based on punctuation, silence gaps, word count, or duration.

Class Definition

from parakeet_mlx import SentenceConfig

config = SentenceConfig(
    max_words=None,
    silence_gap=None,
    max_duration=None
)

Fields

max_words

int | None

default:"None"

Maximum number of words allowed in a single sentence. When the next token would exceed this limit, a sentence break is created.Set to None to disable word-based splitting.Example: max_words=30

silence_gap

float | None

default:"None"

Minimum silence duration (in seconds) that triggers a sentence split. When the gap between tokens exceeds this threshold, a new sentence is created.Set to None to disable silence-based splitting.Example: silence_gap=5.0

max_duration

float | None

default:"None"

Maximum duration (in seconds) allowed for a single sentence. When a sentence reaches this duration, it is split even if no other conditions are met.Set to None to disable duration-based splitting.Example: max_duration=40.0

Splitting Behavior

Sentences are automatically split at punctuation marks (., !, ?, 。, ？, ！). Additional splits occur when any of the following conditions are met:

Punctuation: Token contains sentence-ending punctuation
Word limit: Next token would exceed max_words (if set)
Silence gap: Gap between current and next token exceeds silence_gap (if set)
Duration limit: Sentence duration exceeds max_duration (if set)

Examples

Default Configuration

By default, sentences are only split at punctuation:

from parakeet_mlx import from_pretrained, DecodingConfig, SentenceConfig

model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")

# Default: only split on punctuation
config = DecodingConfig(sentence=SentenceConfig())
result = model.transcribe("audio.wav", decoding_config=config)

for sentence in result.sentences:
    print(sentence.text)

Limit Words per Sentence

Split long sentences to keep them under 30 words:

from parakeet_mlx import from_pretrained, DecodingConfig, SentenceConfig

model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")

config = DecodingConfig(
    sentence=SentenceConfig(max_words=30)
)

result = model.transcribe("audio.wav", decoding_config=config)

for sentence in result.sentences:
    word_count = len([t for t in sentence.tokens if " " in t.text])
    print(f"{word_count} words: {sentence.text}")

Split on Long Silences

Create sentence breaks when silence exceeds 5 seconds:

from parakeet_mlx import from_pretrained, DecodingConfig, SentenceConfig

model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")

config = DecodingConfig(
    sentence=SentenceConfig(silence_gap=5.0)
)

result = model.transcribe("audio.wav", decoding_config=config)

for sentence in result.sentences:
    print(f"[{sentence.start:.1f}s - {sentence.end:.1f}s] {sentence.text}")

Limit Sentence Duration

Keep sentences under 40 seconds:

from parakeet_mlx import from_pretrained, DecodingConfig, SentenceConfig

model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")

config = DecodingConfig(
    sentence=SentenceConfig(max_duration=40.0)
)

result = model.transcribe("audio.wav", decoding_config=config)

for sentence in result.sentences:
    print(f"Duration: {sentence.duration:.1f}s - {sentence.text}")

Combine Multiple Constraints

Use all splitting criteria together:

from parakeet_mlx import from_pretrained, DecodingConfig, SentenceConfig

model = from_pretrained("mlx-community/parakeet-tdt-0.6b-v3")

config = DecodingConfig(
    sentence=SentenceConfig(
        max_words=30,       # Max 30 words
        silence_gap=5.0,    # Split on 5+ second silences
        max_duration=40.0   # Max 40 second duration
    )
)

result = model.transcribe("audio.wav", decoding_config=config)

for i, sentence in enumerate(result.sentences, 1):
    print(f"Sentence {i}: [{sentence.start:.1f}s - {sentence.end:.1f}s]")
    print(f"  {sentence.text}")
    print()

Use Cases

Subtitle Generation

For video subtitles, limit words and duration to fit on screen:

config = DecodingConfig(
    sentence=SentenceConfig(
        max_words=15,       # Short lines for readability
        max_duration=5.0    # Quick screen changes
    )
)

Meeting Transcription

For meeting notes, split on natural pauses:

config = DecodingConfig(
    sentence=SentenceConfig(
        silence_gap=3.0,    # Split on speaker pauses
        max_duration=30.0   # Keep paragraphs manageable
    )
)

Podcast Transcription

For long-form content with natural flow:

config = DecodingConfig(
    sentence=SentenceConfig(
        max_words=40,       # Allow longer sentences
        silence_gap=8.0     # Only split on significant pauses
    )
)

DecodingConfig - Parent configuration class
AlignedSentence - Sentence objects created by this config
Sentence Splitting Guide - Learn more about sentence splitting strategies

Models

Configuration

Results

Audio Processing

Class Definition

Fields

Splitting Behavior

Examples

Default Configuration

Limit Words per Sentence

Split on Long Silences

Limit Sentence Duration

Combine Multiple Constraints

Use Cases

Subtitle Generation

Meeting Transcription

Podcast Transcription

Build docs developers (and LLMs) love

Models

Configuration

Results

Audio Processing

Documentation Index

​Class Definition

​Fields

​Splitting Behavior

​Examples

​Default Configuration

​Limit Words per Sentence

​Split on Long Silences

​Limit Sentence Duration

​Combine Multiple Constraints

​Use Cases

​Subtitle Generation

​Meeting Transcription

​Podcast Transcription

​Related

Build docs developers (and LLMs) love

Class Definition

Fields

Splitting Behavior

Examples

Default Configuration

Limit Words per Sentence

Split on Long Silences

Limit Sentence Duration

Combine Multiple Constraints

Use Cases

Subtitle Generation

Meeting Transcription

Podcast Transcription

Related