Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/ragaeeb/paragrafs/llms.txt

Use this file to discover all available pages before exploring further.

estimateSegmentFromToken

Estimates a segment with word-level tokens from a single token with multi-word text. Splits the text by whitespace and calculates approximate timing for each word.
function estimateSegmentFromToken(token: Token): Segment

Parameters

token
Token
required
The source token containing text with multiple words

Returns

segment
Segment
A segment with the original text and estimated word-level tokens

Example

import { estimateSegmentFromToken } from 'paragrafs';

const token = {
  start: 0,
  end: 5,
  text: 'The quick brown fox'
};

const segment = estimateSegmentFromToken(token);
console.log(segment);
// {
//   start: 0,
//   end: 5,
//   text: 'The quick brown fox',
//   tokens: [
//     { start: 0, end: 1.25, text: 'The' },
//     { start: 1.25, end: 2.5, text: 'quick' },
//     { start: 2.5, end: 3.75, text: 'brown' },
//     { start: 3.75, end: 5, text: 'fox' }
//   ]
// }

markTokensWithDividers

Marks tokens with segment dividers based on various criteria including filler words, hints, time gaps, and punctuation.
function markTokensWithDividers(
  tokens: Token[],
  options: MarkTokensWithDividersOptions
): MarkedToken[]

Parameters

tokens
Token[]
required
Array of tokens to process
options
MarkTokensWithDividersOptions
required

Returns

markedTokens
MarkedToken[]
Tokens with segment break markers (SEGMENT_BREAK or ALWAYS_BREAK) inserted

Example

import { markTokensWithDividers } from 'paragrafs';

const tokens = [
  { start: 0, end: 1, text: 'Hello' },
  { start: 1, end: 2, text: 'world.' },
  { start: 5, end: 6, text: 'How' },  // 3-second gap
  { start: 6, end: 7, text: 'are' },
  { start: 7, end: 8, text: 'you?' }
];

const marked = markTokensWithDividers(tokens, {
  fillers: ['umm', 'uh'],
  gapThreshold: 3
});
// Returns tokens with SEGMENT_BREAK markers inserted after punctuation and gaps

groupMarkedTokensIntoSegments

Groups marked tokens into segments based on maximum segment duration. Creates segments from tokens, splitting when the duration exceeds the specified maximum.
function groupMarkedTokensIntoSegments(
  markedTokens: MarkedToken[],
  maxSecondsPerSegment: number
): MarkedSegment[]

Parameters

markedTokens
MarkedToken[]
required
Array of tokens with segment break markers
maxSecondsPerSegment
number
required
Maximum duration (in seconds) for a segment

Returns

segments
MarkedSegment[]
Array of marked segments

Example

import { markTokensWithDividers, groupMarkedTokensIntoSegments } from 'paragrafs';

const tokens = [/* ... */];
const marked = markTokensWithDividers(tokens, { gapThreshold: 3 });
const segments = groupMarkedTokensIntoSegments(marked, 12);

mergeShortSegmentsWithPrevious

Merges segments with fewer than the specified minimum words into the previous segment. This helps avoid very short segments that might break the flow of text.
function mergeShortSegmentsWithPrevious(
  segments: MarkedSegment[],
  minWordsPerSegment: number
): MarkedSegment[]

Parameters

segments
MarkedSegment[]
required
Array of marked segments to process
minWordsPerSegment
number
required
Minimum number of words required for a segment to stand alone

Returns

merged
MarkedSegment[]
Array of merged segments (segments with ALWAYS_BREAK are never merged)

Example

import { mergeShortSegmentsWithPrevious } from 'paragrafs';

const segments = [/* marked segments */];
const merged = mergeShortSegmentsWithPrevious(segments, 3);
// Short segments (< 3 words) are merged into previous segment

markAndCombineSegments

Convenience function that processes segments through all steps: marking tokens with dividers, grouping into segments, and merging short segments.
function markAndCombineSegments(
  segments: Segment[],
  options: MarkAndCombineSegmentsOptions
): MarkedSegment[]

Parameters

segments
Segment[]
required
Array of input segments to process
options
MarkAndCombineSegmentsOptions
required

Returns

segments
MarkedSegment[]
Array of processed and marked segments

Example

import { markAndCombineSegments } from 'paragrafs';

const segments = [
  {
    start: 0,
    end: 10,
    text: 'Hello world',
    tokens: [
      { start: 0, end: 5, text: 'Hello' },
      { start: 5, end: 10, text: 'world' }
    ]
  }
];

const processed = markAndCombineSegments(segments, {
  fillers: ['uh', 'umm'],
  gapThreshold: 3,
  maxSecondsPerSegment: 12,
  minWordsPerSegment: 3
});

mapSegmentsIntoFormattedSegments

Maps marked segments into formatted segments with clean text representation. Combines the tokens into properly formatted text, respecting segment breaks and optional maximum line duration.
function mapSegmentsIntoFormattedSegments(
  segments: MarkedSegment[],
  maxSecondsPerLine?: number
): Segment[]

Parameters

segments
MarkedSegment[]
required
Array of marked segments to format
maxSecondsPerLine
number
Optional maximum duration (in seconds) for a single line

Returns

formatted
Segment[]
Array of formatted segments with clean text (multiple lines separated by newlines)

Example

import { markAndCombineSegments, mapSegmentsIntoFormattedSegments } from 'paragrafs';

const segments = [/* ... */];
const marked = markAndCombineSegments(segments, {
  fillers: [],
  gapThreshold: 3,
  maxSecondsPerSegment: 12,
  minWordsPerSegment: 3
});

const formatted = mapSegmentsIntoFormattedSegments(marked, 10);
console.log(formatted[0].text);
// Clean text with newlines where appropriate

formatSegmentsToTimestampedTranscript

Formats segments into a timestamped transcript with timestamps at the beginning of each line. Lines are split based on segment breaks and maximum line duration.
function formatSegmentsToTimestampedTranscript(
  segments: MarkedSegment[],
  maxSecondsPerLine: number,
  formatTokens?: (buffer: Token) => string
): string

Parameters

segments
MarkedSegment[]
required
Array of marked segments to format
maxSecondsPerLine
number
required
Maximum duration (in seconds) for a single line
formatTokens
(buffer: Token) => string
Optional formatter that receives the buffered token range and returns the formatted line. When omitted, the function emits timestamp-prefixed strings.

Returns

transcript
string
Formatted transcript with timestamps (newline-separated)

Example

import { formatSegmentsToTimestampedTranscript } from 'paragrafs';

const segments = [/* marked segments */];
const transcript = formatSegmentsToTimestampedTranscript(segments, 10);
console.log(transcript);
// 0:00: The quick brown fox
// 0:05: jumps over the lazy dog

// Custom formatter
const custom = formatSegmentsToTimestampedTranscript(segments, 10, (token) => {
  return `[${token.start.toFixed(2)}s] ${token.text}`;
});

cleanupIsolatedTokens

Cleans up marked tokens by removing unnecessary segment breaks that would cause individual tokens to appear on their own lines.
function cleanupIsolatedTokens(markedTokens: MarkedToken[]): MarkedToken[]

Parameters

markedTokens
MarkedToken[]
required
The array of marked tokens to clean up

Returns

cleaned
MarkedToken[]
A new array with unnecessary breaks removed

Example

import { markTokensWithDividers, cleanupIsolatedTokens } from 'paragrafs';

const tokens = [/* ... */];
const marked = markTokensWithDividers(tokens, { gapThreshold: 3 });
const cleaned = cleanupIsolatedTokens(marked);
// Redundant breaks that would isolate single words are removed

Build docs developers (and LLMs) love