Skip to main content
LiteParse extracts text from PDFs, Office documents, and images with precise spatial layout and bounding boxes. It includes built-in Tesseract.js OCR, supports pluggable HTTP OCR servers, and generates high-quality page screenshots for LLM agents — all without sending data to the cloud.

Quick Start

Install LiteParse and parse your first document in under 2 minutes.

Library Usage

Use LiteParse as a Node.js library in your application.

CLI Reference

Explore all CLI commands: parse, batch-parse, and screenshot.

API Reference

Full TypeScript API — the LiteParse class, config options, and types.

Key features

Spatial text extraction

Preserves text layout with precise bounding boxes using PDF.js — ideal for structured documents.

Built-in OCR

Tesseract.js is included out of the box. No setup required for scanned documents.

Pluggable OCR servers

Connect EasyOCR, PaddleOCR, or any custom OCR server via a simple HTTP API.

Multi-format input

Automatically converts DOCX, XLSX, PPTX, and images to PDF before parsing.

Screenshot generation

Generate high-quality page screenshots for LLM visual agents.

Runs locally

No cloud dependencies. Everything runs on your machine — Linux, macOS, or Windows.

Get started

1

Install LiteParse

Install globally via npm to use the lit CLI, or add as a library dependency.
npm install -g @llamaindex/liteparse
2

Parse a document

Run the lit parse command on any PDF, Office document, or image.
lit parse document.pdf
3

Use as a library

Import LiteParse in your Node.js project for programmatic access.
import { LiteParse } from '@llamaindex/liteparse';

const parser = new LiteParse({ ocrEnabled: true });
const result = await parser.parse('document.pdf');
console.log(result.text);
Need higher accuracy on complex documents — dense tables, multi-column layouts, or handwritten text? Try LlamaParse, the cloud-based document parser built for production pipelines.

Build docs developers (and LLMs) love