Basic Usage

Overview

Paragrafs provides a simple API for converting raw AI transcription tokens into properly formatted paragraphs. This guide covers the essential functions you’ll need to get started.

Installation

First, install Paragrafs in your project:

npm install paragrafs

Core Workflow

The basic workflow for processing transcriptions involves three main steps:

Estimate segments from tokens

Convert multi-word tokens into segments with word-level timing information.

Mark and combine segments

Process segments to identify natural paragraph breaks based on fillers, gaps, and punctuation.

Format into readable output

Transform marked segments into clean, formatted text.

Quick Start Example

Here’s a complete example showing how to process a simple transcription:

import { 
  estimateSegmentFromToken, 
  markAndCombineSegments, 
  mapSegmentsIntoFormattedSegments 
} from 'paragrafs';

// Example token from transcription
const token = {
  start: 0,
  end: 5,
  text: 'This is a sample text. It should be properly segmented.',
};

// Estimate segment with word-level tokens
const segment = estimateSegmentFromToken(token);

// Combine and format segments
const formattedSegments = mapSegmentsIntoFormattedSegments([segment]);

console.log(formattedSegments[0].text);
// Output: "This is a sample text. It should be properly segmented."

Working with Multiple Segments

For more complex transcriptions with multiple segments, use the complete processing pipeline:

import {
  markAndCombineSegments,
  mapSegmentsIntoFormattedSegments,
} from 'paragrafs';

// Example transcription segments
const segments = [
  {
    start: 0,
    end: 6.5,
    text: 'The quick brown fox!',
    tokens: [
      { start: 0, end: 1, text: 'The' },
      { start: 1, end: 2, text: 'quick' },
      { start: 2, end: 3, text: 'brown' },
      { start: 3, end: 6.5, text: 'fox!' },
    ],
  },
  {
    start: 8,
    end: 13,
    text: 'Jumps right over the',
    tokens: [
      { start: 8, end: 9, text: 'Jumps' },
      { start: 9, end: 10, text: 'right' },
      { start: 10, end: 11, text: 'over' },
      { start: 12, end: 13, text: 'the' },
    ],
  },
];

// Options for segment formatting
const options = {
  fillers: ['uh', 'umm', 'hmmm'],
  gapThreshold: 3,
  maxSecondsPerSegment: 12,
  minWordsPerSegment: 3,
};

// Process the segments
const combinedSegments = markAndCombineSegments(segments, options);
const formattedSegments = mapSegmentsIntoFormattedSegments(combinedSegments);

console.log(formattedSegments);

Configuration Options

The markAndCombineSegments function accepts several options to customize paragraph reconstruction:

Option	Type	Description
`fillers`	`string[]`	Words to treat as filler (e.g., “uh”, “umm”) that trigger segment breaks
`gapThreshold`	`number`	Minimum time gap in seconds to trigger a segment break
`maxSecondsPerSegment`	`number`	Maximum duration in seconds for a single segment
`minWordsPerSegment`	`number`	Minimum words required for a segment to stand alone
`hints`	`Hints`	Optional multi-word phrase hints for custom break points

Core Data Types

Understanding the basic types will help you work effectively with Paragrafs:

type Token = {
  start: number;  // Start time in seconds
  end: number;    // End time in seconds
  text: string;   // The transcribed text
};

type Segment = Token & {
  tokens: Token[];  // Word-by-word breakdown with timings
};

type MarkedSegment = {
  start: number;
  end: number;
  tokens: MarkedToken[];  // Tokens with break markers
};

Next Steps

Timestamped Transcripts

Learn how to create human-readable transcripts with timestamps

Ground Truth Alignment

Align AI tokens with human-edited text

Auto-Hint Generation

Automatically discover repeated phrases

Arabic Support

Work with Arabic text normalization

Getting Started

Core Concepts

Guides

API Reference

Resources

Overview

Installation

Core Workflow

Quick Start Example

Working with Multiple Segments

Configuration Options

Core Data Types

Next Steps

Timestamped Transcripts

Ground Truth Alignment

Auto-Hint Generation

Arabic Support

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Guides

API Reference

Resources

Documentation Index

​Overview

​Installation

​Core Workflow

​Quick Start Example

​Working with Multiple Segments

​Configuration Options

​Core Data Types

​Next Steps

Timestamped Transcripts

Ground Truth Alignment

Auto-Hint Generation

Arabic Support

Build docs developers (and LLMs) love

Overview

Installation

Core Workflow

Quick Start Example

Working with Multiple Segments

Configuration Options

Core Data Types

Next Steps