Apache PDFBox is an open source Java library for working with PDF documents. Published by the Apache Software Foundation under the Apache License, Version 2.0, it gives you the tools to create new PDFs, modify existing ones, extract text and images, render pages to images, and more — all from pure Java code. This documentation covers the PDFBox 3.x API, module structure, setup, and common workflows.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/apache/pdfbox/llms.txt
Use this file to discover all available pages before exploring further.
What is PDFBox?
PDFBox is a project of the Apache Software Foundation. It has been under active development since 2002 and is widely used in enterprise applications for PDF processing. The library is entirely open source, carries no per-use fees, and imposes no restrictions on commercial use beyond the terms of the Apache License 2.0. The library handles a broad range of PDF operations: from generating simple documents to parsing complex forms, from extracting plain text to rendering high-fidelity page images. PDFBox also includes command-line utilities so you can perform common operations without writing any Java code.Key capabilities
Create PDF documents
Build PDFs from scratch using a page model API. Add text, images, vector graphics, and embedded fonts to new or existing documents.
Extract text and content
Pull text from any page using
PDFTextStripper, extract embedded images, and inspect document metadata and XMP properties.Render pages to images
Convert PDF pages to
BufferedImage at arbitrary DPI using PDFRenderer. Useful for thumbnail generation, print preview, and OCR pipelines.Manipulate existing PDFs
Merge documents, split pages, add annotations, fill AcroForm fields, apply watermarks, and perform incremental saves.
Digital signatures
Create and verify digital signatures, including visible signatures, time-stamped signatures, and long-term validation (LTV).
Encryption and security
Encrypt PDFs with RC4 or AES (128-bit or 256-bit), set access permissions, and decrypt documents using owner or user passwords.
Module structure
PDFBox 3.x is split into focused Maven modules. Most applications only need thepdfbox artifact as a direct dependency; the others are pulled in transitively or used for specialized tasks.
| Module | Artifact ID | Purpose |
|---|---|---|
| Core library | pdfbox | Main PDF API — document model, text extraction, rendering, forms |
| Font handling | fontbox | TrueType, Type 1, and CFF font parsing; used internally by pdfbox |
| XMP metadata | xmpbox | Read and write XMP metadata streams |
| I/O primitives | pdfbox-io | Low-level RandomAccessRead abstractions; split from core in 3.x |
| Command-line tools | tools | ExtractText, PDFToImage, Encrypt, and other CLI utilities |
debugger and app modules provide a PDF viewer/debugger desktop application and are not needed for library usage.
System requirements
- Java: Java 11 or higher is required to compile and run PDFBox 3.x.
- Build tool: Maven 3.6.3 or higher. Gradle is also supported.
- Dependencies: PDFBox uses Bouncy Castle for cryptographic operations (encryption, digital signatures). SLF4J-compatible logging is used throughout; wire in any SLF4J binding (e.g. Log4j 2, Logback) for log output.
License and support
PDFBox is released under the Apache License, Version 2.0. You can use it freely in both open source and commercial products.Users mailing list
Ask questions and get help from the PDFBox community.
Issue tracker
Report bugs and request features on the Apache JIRA.
Source code
Browse the source on GitHub, including the examples subproject.
Downloads
Download binary releases and standalone application JARs.