LiteParse

LiteParse extracts text from PDFs, Office documents, and images with precise spatial layout and bounding boxes. It includes built-in Tesseract.js OCR, supports pluggable HTTP OCR servers, and generates high-quality page screenshots for LLM agents — all without sending data to the cloud.

Quick Start

Install LiteParse and parse your first document in under 2 minutes.

Library Usage

Use LiteParse as a Node.js library in your application.

CLI Reference

Explore all CLI commands: parse, batch-parse, and screenshot.

API Reference

Full TypeScript API — the LiteParse class, config options, and types.

Key features

Spatial text extraction

Preserves text layout with precise bounding boxes using PDF.js — ideal for structured documents.

Built-in OCR

Tesseract.js is included out of the box. No setup required for scanned documents.

Pluggable OCR servers

Connect EasyOCR, PaddleOCR, or any custom OCR server via a simple HTTP API.

Multi-format input

Automatically converts DOCX, XLSX, PPTX, and images to PDF before parsing.

Screenshot generation

Generate high-quality page screenshots for LLM visual agents.

Runs locally

No cloud dependencies. Everything runs on your machine — Linux, macOS, or Windows.

Get started

Install LiteParse

Install globally via npm to use the lit CLI, or add as a library dependency.

npm install -g @llamaindex/liteparse

Parse a document

Run the lit parse command on any PDF, Office document, or image.

lit parse document.pdf

Use as a library

Import LiteParse in your Node.js project for programmatic access.

import { LiteParse } from '@llamaindex/liteparse';

const parser = new LiteParse({ ocrEnabled: true });
const result = await parser.parse('document.pdf');
console.log(result.text);

Need higher accuracy on complex documents — dense tables, multi-column layouts, or handwritten text? Try LlamaParse, the cloud-based document parser built for production pipelines.

Quickstart

Getting started

Customization

Writing content

AI tools

Quick Start

Library Usage

CLI Reference

API Reference

Key features

Spatial text extraction

Built-in OCR

Pluggable OCR servers

Multi-format input

Screenshot generation

Runs locally

Get started

Build docs developers (and LLMs) love

Getting started

Customization

Writing content

AI tools

Documentation Index

Quick Start

Library Usage

CLI Reference

API Reference

​Key features

Spatial text extraction

Built-in OCR

Pluggable OCR servers

Multi-format input

Screenshot generation

Runs locally

​Get started

Build docs developers (and LLMs) love

Key features

Get started