Documentation Index Fetch the complete documentation index at: https://mintlify.com/ragaeeb/paragrafs/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Paragrafs can format processed segments into timestamped transcripts, making it easy to create readable transcripts with time markers. This is particularly useful for subtitle generation, video transcriptions, and accessibility features.
Basic Timestamped Output
The formatSegmentsToTimestampedTranscript function converts marked segments into a newline-separated transcript with timestamps:
import {
markAndCombineSegments ,
formatSegmentsToTimestampedTranscript ,
} from 'paragrafs' ;
const segments = [
{
start: 0 ,
end: 6.5 ,
text: 'The quick brown fox!' ,
tokens: [
{ start: 0 , end: 1 , text: 'The' },
{ start: 1 , end: 2 , text: 'quick' },
{ start: 2 , end: 3 , text: 'brown' },
{ start: 3 , end: 6.5 , text: 'fox!' },
],
},
{
start: 8 ,
end: 13 ,
text: 'Jumps right over the' ,
tokens: [
{ start: 8 , end: 9 , text: 'Jumps' },
{ start: 9 , end: 10 , text: 'right' },
{ start: 10 , end: 11 , text: 'over' },
{ start: 12 , end: 13 , text: 'the' },
],
},
];
const options = {
fillers: [ 'uh' , 'umm' , 'hmmm' ],
gapThreshold: 3 ,
maxSecondsPerSegment: 12 ,
minWordsPerSegment: 3 ,
};
const combinedSegments = markAndCombineSegments ( segments , options );
const transcript = formatSegmentsToTimestampedTranscript ( combinedSegments , 10 );
console . log ( transcript );
// Output:
// 0:00: The quick brown fox!
// 0:08: Jumps right over the
How It Works
Process segments
First, segments are marked and combined using markAndCombineSegments to identify natural break points.
Split by duration
Lines are split based on the maxSecondsPerLine parameter to ensure no single line exceeds the maximum duration.
Format timestamps
Timestamps are automatically formatted as m:ss (e.g., “1:05”) or h:mm:ss (e.g., “1:02:05”) for longer durations.
Combine into transcript
All lines are joined with newlines to create the final timestamped transcript.
Timestamps are automatically formatted based on the duration:
Under 1 hour : m:ss format (e.g., 0:00, 1:05, 12:45)
1 hour or more : h:mm:ss format (e.g., 1:02:05, 2:30:15)
The formatting is handled by the formatSecondsToTimestamp utility:
import { formatSecondsToTimestamp } from 'paragrafs' ;
formatSecondsToTimestamp ( 65 ); // "1:05"
formatSecondsToTimestamp ( 3725 ); // "1:02:05"
formatSecondsToTimestamp ( 45 ); // "0:45"
You can provide a custom formatter function to control how each line is rendered:
import {
markAndCombineSegments ,
formatSegmentsToTimestampedTranscript ,
formatSecondsToTimestamp
} from 'paragrafs' ;
const customFormatter = ( buffer ) => {
const timestamp = formatSecondsToTimestamp ( buffer . start );
const duration = ( buffer . end - buffer . start ). toFixed ( 1 );
return `[ ${ timestamp } ] ( ${ duration } s) ${ buffer . text } ` ;
};
const transcript = formatSegmentsToTimestampedTranscript (
combinedSegments ,
10 ,
customFormatter
);
console . log ( transcript );
// Output:
// [0:00] (6.5s) The quick brown fox!
// [0:08] (5.0s) Jumps right over the
Line Duration Control
The maxSecondsPerLine parameter controls when lines are split:
// Short lines (5 seconds max)
const shortTranscript = formatSegmentsToTimestampedTranscript (
combinedSegments ,
5 // maxSecondsPerLine
);
// Longer lines (15 seconds max)
const longTranscript = formatSegmentsToTimestampedTranscript (
combinedSegments ,
15 // maxSecondsPerLine
);
Lines are only split at natural break points (punctuation, segment breaks). The maxSecondsPerLine parameter sets a threshold, but the actual split happens at the next appropriate break.
If you need the segments as structured data rather than plain text, use mapSegmentsIntoFormattedSegments:
import {
markAndCombineSegments ,
mapSegmentsIntoFormattedSegments
} from 'paragrafs' ;
const combinedSegments = markAndCombineSegments ( segments , options );
const formattedSegments = mapSegmentsIntoFormattedSegments (
combinedSegments ,
10 // optional maxSecondsPerLine
);
// Each formatted segment has clean text and timing info
formattedSegments . forEach ( segment => {
console . log ( ` ${ segment . start } s - ${ segment . end } s: ${ segment . text } ` );
});
Use Cases
Subtitle Files Generate SRT or VTT files for video subtitles
Transcript Documents Create readable transcript documents with timestamps
Video Players Build interactive transcripts that sync with video playback
Accessibility Provide accessible transcripts for audio/video content
Complete Example
Here’s a full example combining all the concepts:
import {
estimateSegmentFromToken ,
markAndCombineSegments ,
formatSegmentsToTimestampedTranscript ,
formatSecondsToTimestamp ,
} from 'paragrafs' ;
// Raw token from transcription API
const rawToken = {
start: 0 ,
end: 30 ,
text: 'Welcome to the tutorial. Today we will learn about TypeScript. It is a powerful language that adds types to JavaScript.' ,
};
// Convert to segment
const segment = estimateSegmentFromToken ( rawToken );
// Process with options
const options = {
fillers: [],
gapThreshold: 2 ,
maxSecondsPerSegment: 10 ,
minWordsPerSegment: 5 ,
};
const marked = markAndCombineSegments ([ segment ], options );
// Create timestamped transcript
const transcript = formatSegmentsToTimestampedTranscript ( marked , 8 );
console . log ( transcript );
// Output will show properly formatted lines with timestamps
Next Steps
Ground Truth Alignment Learn how to align AI-generated tokens with human-edited text for improved accuracy