Overview
TheTranscriptionResult class represents the output of a transcription operation. It contains the transcribed text, detailed segment information, language detection results, and performance timing data.
Class Definition
Initializer
Complete transcribed text
Array of transcription segments with timestamps and metadata
Detected or specified language code
Performance timing information
Seek time offset in seconds (for chunked audio)
Properties
The complete transcribed text. All segments are concatenated together.
Array of transcription segments, each containing:
- Text content
- Start and end timestamps
- Token information
- Quality metrics (log probabilities, compression ratio)
- Optional word-level timestamps
ISO 639-1 language code (e.g., “en” for English, “es” for Spanish) detected or specified for this transcription.
Detailed performance metrics including:
- Model loading time
- Audio processing time
- Encoding time
- Decoding time
- Total pipeline duration
- Real-time factor
Seek time offset in seconds when this result is part of a chunked transcription.
Computed Properties
Flat array of all word-level timings across all segments. Only populated when
wordTimestamps is enabled in DecodingOptions.Methods
logSegments()
Logs all segments with timestamps and text to the console.logTimings()
Logs detailed performance timing information to the console.- Audio loading time
- Audio processing time
- Mel spectrogram computation time
- Encoding time
- Decoding time breakdown
- Total pipeline duration
- Tokens per second
- Real-time factor
TranscriptionSegment
Each segment in thesegments array contains detailed information:
Properties
Unique identifier for the segment
Seek position in the audio (in samples)
Start timestamp in seconds
End timestamp in seconds
Transcribed text for this segment
Token IDs generated for this segment
Log probabilities for each token
Sampling temperature used for this segment
Average log probability of all tokens (quality indicator)
Text compression ratio (detects repetitive output)
Probability that this segment contains no speech
Optional array of word-level timings (only when
wordTimestamps is enabled)Computed duration of the segment (end - start)
WordTiming
When word-level timestamps are enabled, each word includes:Properties
The word text
Token IDs that comprise this word
Start timestamp in seconds
End timestamp in seconds
Confidence probability for this word
Computed duration (end - start)
TranscriptionTimings
Detailed performance metrics:Properties
Total time spent loading models
Time spent loading and converting audio
Time spent processing audio samples
Time spent computing mel spectrograms
Time spent in audio encoder
Time spent in text decoder predictions
Total time spent in decoding loop
Total end-to-end pipeline duration
Computed: tokens generated per second
Computed: ratio of processing time to audio duration (< 1.0 means faster than real-time)
Computed: inverse of real-time factor (> 1.0 means faster than real-time)