Quickstart

CLI

Install the CLI

Install the lit command globally:

npm install -g @llamaindex/liteparse

Verify the install:

lit --version

Parse your first document

Pass any PDF (or supported document format) to lit parse:

lit parse document.pdf

Output is printed to stdout as plain text with the spatial layout preserved. To save it to a file:

lit parse document.pdf -o output.txt

Pipe a remote PDF directly without downloading it first:

curl -sL https://example.com/report.pdf | lit parse -

Try JSON output

Use --format json to get structured output with per-page text items and bounding boxes:

lit parse document.pdf --format json -o output.json

Library

Install the package

Add LiteParse as a dependency:

npm install @llamaindex/liteparse

Parse a file

Import LiteParse and call parse() with a file path:

import { LiteParse } from '@llamaindex/liteparse';

const parser = new LiteParse({ ocrEnabled: true });
const result = await parser.parse('document.pdf');
console.log(result.text);

result.text contains the full document text with spatial layout preserved across all pages. Per-page data is available in result.pages.

Parse from a Buffer or Uint8Array

You can pass raw bytes instead of a file path. PDF bytes go straight to the parser with zero disk I/O; non-PDF bytes are written to a temp file for format conversion.

From disk
From HTTP

import { LiteParse } from '@llamaindex/liteparse';
import { readFile } from 'fs/promises';

const parser = new LiteParse();

const pdfBytes = await readFile('document.pdf');
const result = await parser.parse(pdfBytes);
console.log(result.text);

import { LiteParse } from '@llamaindex/liteparse';

const parser = new LiteParse();

const response = await fetch('https://example.com/document.pdf');
const buffer = Buffer.from(await response.arrayBuffer());
const result = await parser.parse(buffer);
console.log(result.text);

Get structured JSON output

Set outputFormat: 'json' to receive per-page text items with coordinates and font metadata:

import { LiteParse } from '@llamaindex/liteparse';

const parser = new LiteParse({ outputFormat: 'json' });
const result = await parser.parse('document.pdf');

for (const page of result.json.pages) {
  console.log(`Page ${page.page}: ${page.textItems.length} text items`);
  for (const item of page.textItems) {
    console.log(`  "${item.text}" at (${item.x}, ${item.y})`);
  }
}

Each textItem includes text, x, y, width, height, fontName, and fontSize.

OCR is enabled by default using the built-in Tesseract.js engine — no setup required. On the first run, Tesseract downloads language data from the internet. For offline use, set TESSDATA_PREFIX to a directory containing pre-downloaded .traineddata files.

Next steps

Library usage

Explore the full LiteParse API: configuration options, OCR setup, screenshot generation, and more.

CLI reference

Full reference for lit parse, lit batch-parse, and lit screenshot commands and all their flags.

Getting started

Customization

Writing content

AI tools

CLI

Library

Next steps

Library usage

CLI reference

Build docs developers (and LLMs) love

Getting started

Customization

Writing content

AI tools

Documentation Index

​CLI

​Library

​Next steps

Library usage

CLI reference

Build docs developers (and LLMs) love

CLI

Library

Next steps