Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/apache/pdfbox/llms.txt

Use this file to discover all available pages before exploring further.

XMPBox implements Adobe’s XMP (Extensible Metadata Platform) specification for Java. It can parse, validate, and serialize XMP metadata embedded in PDF documents and is used by PDFBox when reading or writing document-level metadata. XMPBox is required when your application must produce or inspect PDF/A compliance metadata or any other XMP schema beyond what the basic PDFBox document information dictionary provides.

Dependency

<dependency>
  <groupId>org.apache.pdfbox</groupId>
  <artifactId>xmpbox</artifactId>
  <version>3.0.0</version>
</dependency>

Key classes

ClassPackagePurpose
XMPMetadataorg.apache.xmpboxRoot container for all XMP schemas in a document; created via XMPMetadata.createXMPMetadata()
DublinCoreSchemaschemaDublin Core properties: title, creator, description, subject, date
AdobePDFSchemaschemaAdobe PDF-specific properties: PDFVersion, Producer, Keywords
PDFAIdentificationSchemaschemaPDF/A conformance level and part identifier required for PDF/A compliance
XMPBasicSchemaschemaCore XMP properties: CreateDate, ModifyDate, CreatorTool
XmpSerializerxmlSerializes an XMPMetadata object to an XML byte stream
DomXmpParserxmlParses raw XMP XML bytes into an XMPMetadata object

Reading XMP from a PDF

Use PDMetadata to retrieve the raw XMP stream from a PDDocument, then parse it with DomXmpParser.
PDDocument doc = Loader.loadPDF(new File("document.pdf"));
PDMetadata rawMetadata = doc.getDocumentCatalog().getMetadata();

if (rawMetadata != null)
{
    DomXmpParser parser = new DomXmpParser();
    XMPMetadata xmp = parser.parse(rawMetadata.toByteArray());

    DublinCoreSchema dc = xmp.getDublinCoreSchema();
    if (dc != null)
    {
        System.out.println("Title: " + dc.getTitle());
    }
}

Writing XMP metadata

Create an XMPMetadata instance, populate the required schemas, then serialize and attach the result to the document catalog.
XMPMetadata xmp = XMPMetadata.createXMPMetadata();

// Add Dublin Core schema
DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
dc.setTitle("My Document");
dc.addCreator("Example Corp");

// Add PDF/A identification for compliance
PDFAIdentificationSchema pdfaId = xmp.createAndAddPDFAIdentificationSchema();
pdfaId.setPart(1);
pdfaId.setConformance("B");

// Serialize to bytes and attach to the document
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
serializer.serialize(xmp, buffer, true);

PDMetadata metadata = new PDMetadata(doc);
metadata.importXMPMetadata(buffer.toByteArray());
doc.getDocumentCatalog().setMetadata(metadata);
XMPBox is required if you need PDF/A compliance metadata. The PDFAIdentificationSchema must be present and correctly populated for a document to be validated as PDF/A-1b, PDF/A-2b, or similar conformance levels.

Build docs developers (and LLMs) love