DecodingConfig is a dataclass that controls how the model decodes audio and splits text into sentences.
Class Definition
Fields
Configuration for how to split transcribed text into sentences.See SentenceConfig for configuration options.
Greedy
Greedy decoding selects the most likely token at each step. This is the fastest decoding method.Beam
Beam search decoding explores multiple hypotheses to find better transcriptions. Currently only available for TDT models.Fields
Number of hypotheses to explore simultaneously. Larger values may improve accuracy but increase computation time.Example:
beam_size=5Penalty applied based on sequence length. Higher values favor longer sequences.
1.0: No penalty> 1.0: Favor longer sequences< 1.0: Favor shorter sequences
length_penalty=0.013Controls how many candidate hypotheses to explore. Higher values allow more exploration.The maximum number of candidates is
beam_size * patience.Example: patience=3.5TDT-only parameter. Controls the balance between token logprobs and duration logprobs.
0.0: Only consider token logprobs0.5: Equal weight to both1.0: Only consider duration logprobs
duration_reward=0.67Examples
Greedy Decoding
Beam Search Decoding
With Sentence Configuration
Related
- SentenceConfig - Configure sentence splitting
- BaseParakeet - Use DecodingConfig with model methods
- Beam Decoding Guide - Learn more about beam search