PDF to images
PDF to JPG Render each PDF page as a JPG image. Configure DPI and JPEG quality. Downloads as a ZIP when the PDF has multiple pages. PDF to PNG Render each PDF page as a PNG image. Lossless format — best when you need pixel-perfect fidelity or transparency. PDF to WebP Render each PDF page as a WebP image. Good balance of quality and file size for web use. PDF to BMP Render each PDF page as a BMP image. PDF to TIFF Render each PDF page as a TIFF image. TIFF supports lossless compression and is common in document archiving workflows. PDF to CBZ Convert a PDF into a CBZ (Comic Book Archive) file. CBZ files are ZIP archives of images and are natively supported by comic reader apps and Calibre. PDF to SVG Convert each PDF page into a scalable vector graphic (SVG). Useful when you need to edit or re-render PDF content at any resolution. Uses PyMuPDF WASM. PDF to Greyscale Convert a full-color PDF into a black-and-white version. Useful before printing or when reducing file size. The PDF structure is preserved — text remains selectable. Extract Images Extract all images that are embedded in the PDF file (as opposed to rendering pages as images). Downloads as a ZIP. Uses PyMuPDF WASM.PDF to documents
PDF to Word Convert a PDF into an editable Word document (.docx). BentoPDF attempts to reconstruct text flow, headings, and tables. Uses PyMuPDF WASM.
PDF to Text
Extract all text from a PDF and save it as a plain .txt file. Preserves line breaks but not visual formatting.
PDF to Markdown
Convert PDF text and tables into Markdown format. Useful for feeding document content into text editors, static site generators, or LLMs. Uses PyMuPDF WASM.
PDF to JSON
Convert PDF content into a structured JSON format. Captures page structure, text blocks, and metadata.
PDF to data
PDF to CSV Detect and extract tables from a PDF and export them as CSV. Each table is saved as a separate file. Uses PyMuPDF WASM. PDF to Excel Detect and extract tables from a PDF and export them as an Excel workbook (.xlsx). Each table becomes a sheet. Uses PyMuPDF WASM.
Extract Tables
Extract tables from a PDF and export them in your choice of format: CSV, JSON, or Markdown. A single run can produce all three formats at once. Uses PyMuPDF WASM.
OCR and AI tools
OCR PDF Turn scanned or image-based PDFs into searchable, copyable PDFs using Tesseract.js — a WebAssembly port of Tesseract OCR that runs entirely in the browser. How it works: Tesseract processes each page image and produces an invisible text layer that is overlaid on the original page. The visual appearance of the document is unchanged, but you can now search and copy the text. Key options:| Setting | Values | Purpose |
|---|---|---|
| Language(s) | 100+ languages via searchable selector | Select all languages present in the document |
| Resolution | Standard (192 DPI), High (288 DPI), Ultra (384 DPI) | Higher = better accuracy, slower processing |
| Binarize image | On / Off | Improves accuracy for low-contrast or faded scans |
| Character whitelist | None, Alphanumeric, Numbers + Currency, Letters Only, Numbers Only, Invoice, Forms, Custom | Restricts the character set for specific document types |
For self-hosted or air-gapped deployments, you can bundle specific Tesseract language data and configure the
VITE_TESSERACT_LANG_URL, VITE_TESSERACT_WORKER_URL, and VITE_TESSERACT_CORE_URL environment variables at build time. See WASM Configuration for details.- Building document Q&A applications
- Indexing enterprise documents in a vector database
- Pre-processing PDFs for fine-tuning or prompt engineering
PDF/A conversion
PDF to PDF/A Convert a standard PDF into PDF/A format for long-term archival. PDF/A is an ISO standard that prohibits features like encryption and external dependencies, ensuring the file can be rendered identically in the future. The conversion uses Ghostscript WASM. Supports PDF/A-1b, PDF/A-2b, and PDF/A-3b output profiles.PDF to PDF/A is also listed under Optimize & Repair because it is primarily an archival and optimization operation.
