Convert from PDF

Convert from PDF tools let you pull content out of PDFs in whatever format you need — from page-level images to structured tables to machine-readable JSON for LLM pipelines. All tools run client-side.

PDF to images

PDF to JPG Render each PDF page as a JPG image. Configure DPI and JPEG quality. Downloads as a ZIP when the PDF has multiple pages. PDF to PNG Render each PDF page as a PNG image. Lossless format — best when you need pixel-perfect fidelity or transparency. PDF to WebP Render each PDF page as a WebP image. Good balance of quality and file size for web use. PDF to BMP Render each PDF page as a BMP image. PDF to TIFF Render each PDF page as a TIFF image. TIFF supports lossless compression and is common in document archiving workflows. PDF to CBZ Convert a PDF into a CBZ (Comic Book Archive) file. CBZ files are ZIP archives of images and are natively supported by comic reader apps and Calibre. PDF to SVG Convert each PDF page into a scalable vector graphic (SVG). Useful when you need to edit or re-render PDF content at any resolution. Uses PyMuPDF WASM. PDF to Greyscale Convert a full-color PDF into a black-and-white version. Useful before printing or when reducing file size. The PDF structure is preserved — text remains selectable. Extract Images Extract all images that are embedded in the PDF file (as opposed to rendering pages as images). Downloads as a ZIP. Uses PyMuPDF WASM.

PDF to documents

PDF to Word Convert a PDF into an editable Word document (.docx). BentoPDF attempts to reconstruct text flow, headings, and tables. Uses PyMuPDF WASM. PDF to Text Extract all text from a PDF and save it as a plain .txt file. Preserves line breaks but not visual formatting. PDF to Markdown Convert PDF text and tables into Markdown format. Useful for feeding document content into text editors, static site generators, or LLMs. Uses PyMuPDF WASM. PDF to JSON Convert PDF content into a structured JSON format. Captures page structure, text blocks, and metadata.

PDF to data

PDF to CSV Detect and extract tables from a PDF and export them as CSV. Each table is saved as a separate file. Uses PyMuPDF WASM. PDF to Excel Detect and extract tables from a PDF and export them as an Excel workbook (.xlsx). Each table becomes a sheet. Uses PyMuPDF WASM. Extract Tables Extract tables from a PDF and export them in your choice of format: CSV, JSON, or Markdown. A single run can produce all three formats at once. Uses PyMuPDF WASM.

OCR and AI tools

OCR PDF Turn scanned or image-based PDFs into searchable, copyable PDFs using Tesseract.js — a WebAssembly port of Tesseract OCR that runs entirely in the browser. How it works: Tesseract processes each page image and produces an invisible text layer that is overlaid on the original page. The visual appearance of the document is unchanged, but you can now search and copy the text. Key options:

Setting	Values	Purpose
Language(s)	100+ languages via searchable selector	Select all languages present in the document
Resolution	Standard (192 DPI), High (288 DPI), Ultra (384 DPI)	Higher = better accuracy, slower processing
Binarize image	On / Off	Improves accuracy for low-contrast or faded scans
Character whitelist	None, Alphanumeric, Numbers + Currency, Letters Only, Numbers Only, Invoice, Forms, Custom	Restricts the character set for specific document types

Select multiple languages if your document contains mixed-language content. Use the Invoice or Forms whitelist preset on structured documents to reduce false positives.

For self-hosted or air-gapped deployments, you can bundle specific Tesseract language data and configure the VITE_TESSERACT_LANG_URL, VITE_TESSERACT_WORKER_URL, and VITE_TESSERACT_CORE_URL environment variables at build time. See WASM Configuration for details.

Prepare PDF for AI Extract PDF content as LlamaIndex JSON — a structured format designed for Retrieval-Augmented Generation (RAG) and LLM pipelines. The output captures page-level text blocks, headings, tables, and metadata in a schema that LlamaIndex loaders can ingest directly. Useful for:

Building document Q&A applications
Indexing enterprise documents in a vector database
Pre-processing PDFs for fine-tuning or prompt engineering

PDF/A conversion

PDF to PDF/A Convert a standard PDF into PDF/A format for long-term archival. PDF/A is an ISO standard that prohibits features like encryption and external dependencies, ensuring the file can be rendered identically in the future. The conversion uses Ghostscript WASM. Supports PDF/A-1b, PDF/A-2b, and PDF/A-3b output profiles.

PDF to PDF/A is also listed under Optimize & Repair because it is primarily an archival and optimization operation.

Get Started

Tools

Self-Hosting

Customize

Project

PDF to images

PDF to documents

PDF to data

OCR and AI tools

PDF/A conversion

Build docs developers (and LLMs) love

Get Started

Tools

Self-Hosting

Customize

Project

​PDF to images

​PDF to documents

​PDF to data

​OCR and AI tools

​PDF/A conversion

Build docs developers (and LLMs) love

PDF to images

PDF to documents

PDF to data

OCR and AI tools

PDF/A conversion