Types

Paragrafs is fully typed with TypeScript. This page documents all exported types and interfaces.

Core Types

Token

Represents a single token (word or phrase) with timing information. This is the basic unit of transcribed text.

type Token = {
  start: number;  // Start time in seconds
  end: number;    // End time in seconds
  text: string;   // The transcribed text
};

Example:

const token: Token = {
  start: 0,
  end: 1.5,
  text: 'Hello'
};

Segment

Represents a segment of text with timing information and optional word-level tokens. A segment is a higher-level structure that contains a sequence of related tokens.

type Segment = Token & {
  tokens: Token[];  // Word-by-word breakdown of the transcription
};

Example:

const segment: Segment = {
  start: 0,
  end: 5,
  text: 'Hello world',
  tokens: [
    { start: 0, end: 2, text: 'Hello' },
    { start: 2, end: 5, text: 'world' }
  ]
};

Marked Types

MarkedToken

Represents either a token or a segment break marker. Used during the processing of text to identify natural break points.

type MarkedToken = Token | AlwaysBreakMarker | SegmentBreakMarker;

The special markers are:

SEGMENT_BREAK - Soft break marker (can be ignored if duration constraints allow)
ALWAYS_BREAK - Hard break marker (must create a new segment/line)

These markers are inserted automatically by markTokensWithDividers and other processing functions. You don’t typically need to import or create them manually.

Example:

import { markTokensWithDividers } from 'paragrafs';

const tokens = [
  { start: 0, end: 1, text: 'Hello' },
  { start: 1, end: 2, text: 'world.' }
];

// The function inserts markers automatically
const marked = markTokensWithDividers(tokens, {
  gapThreshold: 1.0
});
// marked now contains tokens with SEGMENT_BREAK markers after punctuation

MarkedSegment

Represents a segment during the marking and processing stage. Contains an array of tokens that may include segment break markers.

type MarkedSegment = {
  start: number;          // Start time of the segment in seconds
  end: number;            // End time of the segment in seconds
  tokens: MarkedToken[];  // Array of tokens and segment break markers
};

Example:

const markedSegment: MarkedSegment = {
  start: 0,
  end: 5,
  tokens: [
    { start: 0, end: 1, text: 'Hello' },
    SEGMENT_BREAK,
    { start: 1, end: 2, text: 'world.' },
    SEGMENT_BREAK
  ]
};

Ground Truth Types

GroundedToken

Represents a token that was matched or unmatched during sync with the ground truth value.

type GroundedToken = Token & {
  isUnknown?: boolean;  // If true, this token was not matched during ground truth syncing
};

Example:

const groundedToken: GroundedToken = {
  start: 0,
  end: 1,
  text: 'corrected',
  isUnknown: true  // This word was interpolated, not matched
};

GroundedSegment

Represents a segment that was updated with ground truth values.

type GroundedSegment = Omit<Segment, 'tokens'> & {
  tokens: GroundedToken[];
};

Example:

const groundedSegment: GroundedSegment = {
  start: 0,
  end: 5,
  text: 'The quick brown fox',
  tokens: [
    { start: 0, end: 1, text: 'The' },
    { start: 1, end: 2, text: 'quick', isUnknown: true },
    { start: 2, end: 4, text: 'brown', isUnknown: true },
    { start: 4, end: 5, text: 'fox' }
  ]
};

Hint Types

Hints

Contains a map of normalized hints and the normalization options used.

type Hints = {
  map: HintMap;                              // Map of hints organized by first word
  normalization: Required<ArabicNormalizationOptions>;  // Normalization settings
};

Example:

import { createHints } from 'paragrafs';

const hints: Hints = createHints('hello world', 'good morning');

HintMap

Organizes hints by their first normalized word for efficient matching.

type HintMap = Record<string, string[][]>;

The outer key is the first word of a hint phrase. The value is an array of word arrays representing different hints that start with that word.

GeneratedHint

Represents a hint candidate discovered by the hint generation functions.

type GeneratedHint = {
  phrase: string;               // The most common surface form
  normalizedPhrase: string;     // The normalized version
  count: number;                // Number of occurrences
  length: number;               // Number of words in the phrase
  firstOccurrenceIndex?: number;  // Token index of first occurrence
  topSurfaceForms?: string[];   // Up to 3 most common variations
};

Example:

const hint: GeneratedHint = {
  phrase: 'أحسن الله إليكم',
  normalizedPhrase: 'احسن الله اليكم',
  count: 5,
  length: 3,
  firstOccurrenceIndex: 0,
  topSurfaceForms: ['أحسن الله إليكم', 'أَحْسَنَ الله إليكم']
};

Option Types

ArabicNormalizationOptions

Configuration for Arabic text normalization.

type ArabicNormalizationOptions = {
  normalizeAlef?: boolean;   // Convert أإآ → ا (default: true)
  normalizeHamza?: boolean;  // Normalize hamza variations (default: false)
  normalizeYa?: boolean;     // Convert ى → ي (default: true)
  removeTatweel?: boolean;   // Remove tatweel ـ (default: true)
};

Example:

const options: ArabicNormalizationOptions = {
  normalizeAlef: true,
  normalizeYa: true,
  removeTatweel: true,
  normalizeHamza: false
};

MarkTokensWithDividersOptions

Options for the markTokensWithDividers function.

type MarkTokensWithDividersOptions = {
  fillers?: string[];      // Filler words to mark as breaks
  gapThreshold: number;    // Minimum time gap for a break (seconds)
  hints?: Hints;           // Multi-word hints for hard breaks
};

MarkAndCombineSegmentsOptions

Options for the markAndCombineSegments function.

type MarkAndCombineSegmentsOptions = MarkTokensWithDividersOptions & {
  maxSecondsPerSegment: number;  // Maximum segment duration
  minWordsPerSegment: number;    // Minimum words to avoid merging
};

Example:

const options: MarkAndCombineSegmentsOptions = {
  fillers: ['uh', 'umm'],
  gapThreshold: 3,
  maxSecondsPerSegment: 12,
  minWordsPerSegment: 3
};

GenerateHintsOptions

Options for hint generation functions.

type GenerateHintsOptions = {
  minN?: number;                      // Min n-gram length (default: 2)
  maxN?: number;                      // Max n-gram length (default: 6)
  minCount?: number;                  // Min occurrences (default: 2)
  topK?: number;                      // Max hints to return (default: Infinity)
  dedupe?: 'closed' | 'none';         // Deduplication strategy (default: 'closed')
  stopwords?: string[];               // Words to ignore (default: [])
  normalization?: ArabicNormalizationOptions;  // Normalization options
  boundaryStrategy?: 'segment' | 'none';  // Only for generateHintsFromSegments
};

Example:

const options: GenerateHintsOptions = {
  minN: 2,
  maxN: 4,
  minCount: 3,
  topK: 50,
  dedupe: 'closed',
  normalization: { normalizeAlef: true }
};

Import All Types

All types are exported from the main package:

import type {
  Token,
  Segment,
  MarkedToken,
  MarkedSegment,
  GroundedToken,
  GroundedSegment,
  Hints,
  HintMap,
  GeneratedHint,
  ArabicNormalizationOptions,
  MarkTokensWithDividersOptions,
  MarkAndCombineSegmentsOptions,
  GenerateHintsOptions
} from 'paragrafs';

Getting Started

Core Concepts

Guides

API Reference

Resources

Core Types

Token

Segment

Marked Types

MarkedToken

MarkedSegment

Ground Truth Types

GroundedToken

GroundedSegment

Hint Types

Hints

HintMap

GeneratedHint

Option Types

ArabicNormalizationOptions

MarkTokensWithDividersOptions

MarkAndCombineSegmentsOptions

GenerateHintsOptions

Import All Types

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Guides

API Reference

Resources

Documentation Index

​Core Types

​Token

​Segment

​Marked Types

​MarkedToken

​MarkedSegment

​Ground Truth Types

​GroundedToken

​GroundedSegment

​Hint Types

​Hints

​HintMap

​GeneratedHint

​Option Types

​ArabicNormalizationOptions

​MarkTokensWithDividersOptions

​MarkAndCombineSegmentsOptions

​GenerateHintsOptions

​Import All Types

Build docs developers (and LLMs) love

Core Types

Token

Segment

Marked Types

MarkedToken

MarkedSegment

Ground Truth Types

GroundedToken

GroundedSegment

Hint Types

Hints

HintMap

GeneratedHint

Option Types

ArabicNormalizationOptions

MarkTokensWithDividersOptions

MarkAndCombineSegmentsOptions

GenerateHintsOptions

Import All Types