PDFs store descriptive metadata in two places: the document information dictionary (a simple key-value store in the PDF trailer) and an optional XMP metadata stream attached to the document catalog. PDFBox exposes the information dictionary throughDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/apache/pdfbox/llms.txt
Use this file to discover all available pages before exploring further.
PDDocumentInformation, which provides typed getter and setter methods for all standard fields. For richer, standards-compliant metadata, the separate XMPBox module handles XMP streams.
Document information dictionary
PDDocumentInformation wraps the /Info dictionary in the PDF trailer. Each method returns null if the corresponding entry is absent; passing null to a setter removes the entry.
| Method | Type | Description |
|---|---|---|
getTitle() / setTitle(String) | String | Document title |
getAuthor() / setAuthor(String) | String | Author name |
getSubject() / setSubject(String) | String | Document subject |
getKeywords() / setKeywords(String) | String | Keywords string |
getCreator() / setCreator(String) | String | Creating application |
getProducer() / setProducer(String) | String | PDF producer |
getCreationDate() / setCreationDate(Calendar) | Calendar | Creation timestamp |
getModificationDate() / setModificationDate(Calendar) | Calendar | Last-modified timestamp |
getTrapped() / setTrapped(String) | String | "True", "False", or "Unknown" |
getCustomMetadataValue(String) / setCustomMetadataValue(String, String) | String | Arbitrary custom fields |
Reading metadata
Load a PDF and callgetDocumentInformation() on the PDDocument to retrieve the PDDocumentInformation object:
Writing metadata
TheAddMetadataFromDocInfo example shows how to populate information dictionary fields and save the result:
AddMetadataFromDocInfo.java
XMP metadata
For ISO-standard metadata interoperability, PDFs can carry an XMP stream on the document catalog. TheAddMetadataFromDocInfo example demonstrates how to build an XMP packet from the information dictionary values using the XMPBox module:
AddMetadataFromDocInfo.java
org.apache.pdfbox:xmpbox) that provides schema classes for Dublin Core, XMP Basic, Adobe PDF, and other standard namespaces.