Document is a flat collection of fields — there is no nesting or schema enforcement. Each field has a name, a type, and a value. When you call IndexWriter.addDocument(), Lucene stores and indexes those fields according to their type configuration.
Field types
Lucene ships several concreteField subclasses. Most applications should use these rather than constructing a raw Field with a custom FieldType.
| Class | Value type | Analyzed | Indexed | Stored (optional) | Use for |
|---|---|---|---|---|---|
TextField | String / Reader | Yes | Yes (DOCS_AND_FREQS_AND_POSITIONS) | Optional | Full-text search on prose content |
StringField | String / BytesRef | No | Yes (DOCS only) | Optional | Exact-match keywords, IDs, enum values |
IntField | int | No | Yes (points + doc values) | Optional | Numeric range queries and sorting |
LongField | long | No | Yes (points + doc values) | Optional | Timestamps, counts, range queries |
FloatField | float | No | Yes (points + doc values) | Optional | Floating-point range queries |
DoubleField | double | No | Yes (points + doc values) | Optional | High-precision floating-point range |
KeywordField | String / BytesRef | No | Yes (sorted doc values + inverted index) | Optional | Exact-match with efficient faceting/sorting |
StoredField | String / byte[] / numeric | No | No | Always | Retrieve original values with hits |
KnnFloatVectorField | float[] | No | Yes (HNSW graph) | No | Nearest-neighbor / semantic vector search |
TextField uses IndexOptions.DOCS_AND_FREQS_AND_POSITIONS by default. StringField uses IndexOptions.DOCS and omits norms, because exact-match fields do not need term frequency or positional data.IndexOptions
IndexOptions controls how much information Lucene stores in the inverted index for a field.
DOCS
DOCS
Records only which documents contain a term. Sufficient for boolean queries and exact-match lookups. Used by
StringField.DOCS_AND_FREQS
DOCS_AND_FREQS
Records documents and term frequency within each document. Enables TF-based relevance scoring but no phrase queries.
DOCS_AND_FREQS_AND_POSITIONS
DOCS_AND_FREQS_AND_POSITIONS
Records documents, frequencies, and the position of each token. Required for phrase queries (
PhraseQuery) and proximity queries. Default for TextField.IndexOptions.NONE on a FieldType means the field is not indexed at all — it can still be stored.
DocValues vs stored fields
Lucene offers two ways to retrieve field values at query time:Stored fields
Values are saved in a row-oriented store alongside the document. Retrieved per-document after a hit is found. Good for displaying original content in search results. Use
Field.Store.YES or StoredField.DocValues
Values are stored in a column-oriented structure, one column per field across all documents. Efficient for sorting, faceting, and aggregations over large result sets. Use
NumericDocValuesField, SortedDocValuesField, KeywordField, or the numeric field classes.TextField for full-text search, and once as a StoredField to retrieve the original text without re-running the analyzer.
Building a document
The following example is adapted from the Lucene demo (IndexFiles.java) and shows how to combine field types in a single document.
Fields which are not stored are not available in documents retrieved from the index via
StoredFields.document(int). Index at least one stored field per document that uniquely identifies it (for example, a KeywordField with Field.Store.YES).