Built-in codecs
Lucene104Codec
The current default. Used automatically by every new
IndexWriterConfig unless you override it. Combines Lucene104PostingsFormat, Lucene90DocValuesFormat, Lucene90StoredFieldsFormat, and more.SimpleTextCodec
Writes every index structure as human-readable plain text. Extremely slow and large — intended only for debugging and understanding the index format.
Codec sub-formats
Codec is an abstract class that delegates each responsibility to a dedicated format object. You can extend FilterCodec to override only the sub-formats you care about.
| Method | Responsibility |
|---|---|
postingsFormat() | Term dictionary and postings lists (doc IDs, positions, offsets, payloads) |
docValuesFormat() | Per-document numeric, binary, sorted, and sorted-set doc values |
storedFieldsFormat() | Stored field values retrieved at search time |
termVectorsFormat() | Term vectors (per-document term/position/offset data) |
normsFormat() | Per-field length normalization factors |
liveDocsFormat() | Bitset of non-deleted documents within a segment |
compoundFormat() | Optional bundling of segment files into a single .cfs compound file |
pointsFormat() | BKD-tree encoded numeric and geo points |
knnVectorsFormat() | HNSW-indexed dense float vectors for k-NN search |
Setting a codec on IndexWriterConfig
IndexWriterConfig.setCodec() accepts any Codec instance. The codec is applied to all new segments flushed or merged by that writer.
Mode.BEST_COMPRESSION:
PerFieldPostingsFormat
The defaultLucene104Codec uses PerFieldPostingsFormat internally to route each field to a specific PostingsFormat. This lets you use, for example, a memory-mapped format for a high-traffic field while keeping others on the default disk-based format.
Override getPostingsFormatForField in a Lucene104Codec subclass to apply per-field routing:
The format name written into the index must match a registered
PostingsFormat implementation at read time. If you use a custom format, register it via Java SPI (META-INF/services/org.apache.lucene.codecs.PostingsFormat) in your JAR.getDocValuesFormatForField to apply per-field doc values formats, and getKnnVectorsFormatForField to tune HNSW vector parameters per field.
Writing a custom PostingsFormat
Extend PostingsFormat
Subclass
org.apache.lucene.codecs.PostingsFormat and supply a unique name that will be written into the index.Register via SPI
Create
META-INF/services/org.apache.lucene.codecs.PostingsFormat in your JAR and add your fully-qualified class name:Checking the default codec
You can inspect or change the process-wide default codec that newIndexWriterConfig instances receive: