Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/ragaeeb/paragrafs/llms.txt

Use this file to discover all available pages before exploring further.

Quickstart

Get up and running with Paragrafs in just a few minutes. This guide walks you through processing your first transcription tokens into formatted paragraphs.

Installation

First, install Paragrafs using your preferred package manager:
npm install paragrafs

Basic Example

Here’s a simple example to get started with Paragrafs:
import { 
  estimateSegmentFromToken, 
  mapSegmentsIntoFormattedSegments 
} from 'paragrafs';

// Example token from transcription
const token = {
  start: 0,
  end: 5,
  text: 'This is a sample text. It should be properly segmented.',
};

// Estimate segment with word-level tokens
const segment = estimateSegmentFromToken(token);

// Format the segment
const formattedSegments = mapSegmentsIntoFormattedSegments([segment]);

console.log(formattedSegments[0].text);
// Output: "This is a sample text. It should be properly segmented."

Working with Transcriptions

Process multiple transcription segments with automatic paragraph detection:
1

Import the functions

import {
  markAndCombineSegments,
  mapSegmentsIntoFormattedSegments,
  formatSegmentsToTimestampedTranscript,
} from 'paragrafs';
2

Define your segments

const segments = [
  {
    start: 0,
    end: 6.5,
    text: 'The quick brown fox!',
    tokens: [
      { start: 0, end: 1, text: 'The' },
      { start: 1, end: 2, text: 'quick' },
      { start: 2, end: 3, text: 'brown' },
      { start: 3, end: 6.5, text: 'fox!' },
    ],
  },
  {
    start: 8,
    end: 13,
    text: 'Jumps right over the',
    tokens: [
      { start: 8, end: 9, text: 'Jumps' },
      { start: 9, end: 10, text: 'right' },
      { start: 10, end: 11, text: 'over' },
      { start: 12, end: 13, text: 'the' },
    ],
  },
];
3

Configure options

const options = {
  fillers: ['uh', 'umm', 'hmmm'],
  gapThreshold: 3,
  maxSecondsPerSegment: 12,
  minWordsPerSegment: 3,
};
4

Process and format

// Process the segments
const combinedSegments = markAndCombineSegments(segments, options);
const formattedSegments = mapSegmentsIntoFormattedSegments(combinedSegments);

// Get timestamped transcript
const transcript = formatSegmentsToTimestampedTranscript(combinedSegments, 10);

console.log(transcript);
// Output:
// 0:00: The quick brown fox!
// 0:08: Jumps right over the

Core Types

Paragrafs uses two main types:

Token

Represents a single word or phrase with timing information:
type Token = {
  start: number;  // Start time in seconds
  end: number;    // End time in seconds
  text: string;   // The transcribed text
};

Segment

A higher-level structure containing a sequence of related tokens:
type Segment = Token & {
  tokens: Token[];  // Word-by-word breakdown
};

Next Steps

Now that you’ve got the basics, explore more advanced features:

Ground Truth Alignment

Sync AI tokens with human-edited text

Arabic Support

Learn about Arabic text normalization

Auto-Hint Generation

Auto-generate hints from repeated phrases

API Reference

Explore the complete API

Build docs developers (and LLMs) love