Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/apache/pdfbox/llms.txt

Use this file to discover all available pages before exploring further.

Apache PDFBox is an open source Java library for working with PDF documents. Published by the Apache Software Foundation under the Apache License, Version 2.0, it gives you the tools to create new PDFs, modify existing ones, extract text and images, render pages to images, and more — all from pure Java code. This documentation covers the PDFBox 3.x API, module structure, setup, and common workflows.

What is PDFBox?

PDFBox is a project of the Apache Software Foundation. It has been under active development since 2002 and is widely used in enterprise applications for PDF processing. The library is entirely open source, carries no per-use fees, and imposes no restrictions on commercial use beyond the terms of the Apache License 2.0. The library handles a broad range of PDF operations: from generating simple documents to parsing complex forms, from extracting plain text to rendering high-fidelity page images. PDFBox also includes command-line utilities so you can perform common operations without writing any Java code.

Key capabilities

Create PDF documents

Build PDFs from scratch using a page model API. Add text, images, vector graphics, and embedded fonts to new or existing documents.

Extract text and content

Pull text from any page using PDFTextStripper, extract embedded images, and inspect document metadata and XMP properties.

Render pages to images

Convert PDF pages to BufferedImage at arbitrary DPI using PDFRenderer. Useful for thumbnail generation, print preview, and OCR pipelines.

Manipulate existing PDFs

Merge documents, split pages, add annotations, fill AcroForm fields, apply watermarks, and perform incremental saves.

Digital signatures

Create and verify digital signatures, including visible signatures, time-stamped signatures, and long-term validation (LTV).

Encryption and security

Encrypt PDFs with RC4 or AES (128-bit or 256-bit), set access permissions, and decrypt documents using owner or user passwords.

Module structure

PDFBox 3.x is split into focused Maven modules. Most applications only need the pdfbox artifact as a direct dependency; the others are pulled in transitively or used for specialized tasks.
ModuleArtifact IDPurpose
Core librarypdfboxMain PDF API — document model, text extraction, rendering, forms
Font handlingfontboxTrueType, Type 1, and CFF font parsing; used internally by pdfbox
XMP metadataxmpboxRead and write XMP metadata streams
I/O primitivespdfbox-ioLow-level RandomAccessRead abstractions; split from core in 3.x
Command-line toolstoolsExtractText, PDFToImage, Encrypt, and other CLI utilities
The debugger and app modules provide a PDF viewer/debugger desktop application and are not needed for library usage.

System requirements

  • Java: Java 11 or higher is required to compile and run PDFBox 3.x.
  • Build tool: Maven 3.6.3 or higher. Gradle is also supported.
  • Dependencies: PDFBox uses Bouncy Castle for cryptographic operations (encryption, digital signatures). SLF4J-compatible logging is used throughout; wire in any SLF4J binding (e.g. Log4j 2, Logback) for log output.

License and support

PDFBox is released under the Apache License, Version 2.0. You can use it freely in both open source and commercial products.

Users mailing list

Ask questions and get help from the PDFBox community.

Issue tracker

Report bugs and request features on the Apache JIRA.

Source code

Browse the source on GitHub, including the examples subproject.

Downloads

Download binary releases and standalone application JARs.

Build docs developers (and LLMs) love