Control result order with Sort and SortField, paginate efficiently with searchAfter, boost queries, and customize the similarity function.
By default Lucene ranks results by BM25 relevance score. This page explains how to sort by field values, paginate without offset-based scanning, adjust scores with BoostQuery, and replace the default similarity.
Lexicographic order using doc values (sorted set/bytes)
INT
Numeric integer doc values
LONG
Numeric long doc values
FLOAT
Numeric float doc values
DOUBLE
Numeric double doc values
import org.apache.lucene.search.Sort;import org.apache.lucene.search.SortField;// Sort by a string field ascendingSort byAuthor = new Sort(new SortField("author", SortField.Type.STRING));// Sort by an integer field descending (reverse=true)Sort byPriceDesc = new Sort(new SortField("price", SortField.Type.INT, true));// Sort by score (explicit)Sort byScore = new Sort(SortField.FIELD_SCORE);// Sort by index document orderSort byDoc = new Sort(SortField.FIELD_DOC);
When a document has no value for the sort field, pass a missingValue in the SortField constructor (the four-argument form). SortField.setMissingValue() was removed in Lucene 11.
// Documents missing "price" sort last (Integer.MAX_VALUE = highest value)SortField priceSort = new SortField("price", SortField.Type.INT, false, Integer.MAX_VALUE);// Documents missing "price" sort first (Integer.MIN_VALUE = lowest value)SortField priceSortFirst = new SortField("price", SortField.Type.INT, false, Integer.MIN_VALUE);
For string fields use the constants SortField.STRING_FIRST and SortField.STRING_LAST:
When a field can hold multiple numeric values per document (indexed with SortedNumericDocValuesField), use SortedNumericSortField and choose a selector:
import org.apache.lucene.search.SortedNumericSortField;import org.apache.lucene.search.SortedNumericSelector;// Sort by the minimum value in a multi-valued numeric fieldSort byMinPrice = new Sort(new SortedNumericSortField( "prices", SortField.Type.LONG, false, // reverse SortedNumericSelector.Type.MIN));
Deep pagination using from + size requires scanning and discarding all preceding hits. searchAfter is more efficient: pass the last ScoreDoc from the previous page as an anchor, and Lucene begins collection after that document.
1
Execute the first page
TopDocs page1 = searcher.search(query, 10);
2
Retrieve subsequent pages
Pass the last hit from the previous page as after.
searchAfter is safe to use for arbitrarily deep pages because it never scores or loads documents before the anchor. It is the recommended pagination strategy for large result sets.
BoostQuery multiplies the scores returned by a wrapped query by a constant factor. Values greater than 1.0 increase importance; values between 0 and 1 decrease it.
import org.apache.lucene.search.BoostQuery;import org.apache.lucene.search.BooleanQuery;import org.apache.lucene.search.BooleanClause;Query titleMatch = new TermQuery(new Term("title", "lucene"));Query bodyMatch = new TermQuery(new Term("body", "lucene"));// Title matches are 3× more important than body matchesBooleanQuery query = new BooleanQuery.Builder() .add(new BoostQuery(titleMatch, 3.0f), BooleanClause.Occur.SHOULD) .add(bodyMatch, BooleanClause.Occur.SHOULD) .build();
BoostQuery requires a positive, finite boost value. A boost of 1.0f is a no-op and is automatically unwrapped during rewriting.
Replace BM25Similarity by calling IndexSearcher.setSimilarity before the first search. Lucene ships several built-in implementations:
import org.apache.lucene.search.similarities.BM25Similarity;// Use defaultssearcher.setSimilarity(new BM25Similarity());// Tune: higher k1 increases the effect of term frequencysearcher.setSimilarity(new BM25Similarity(2.0f, 0.75f, true));
To implement a fully custom similarity, extend org.apache.lucene.search.similarities.Similarity and override computeWeight and scorer:
import org.apache.lucene.search.similarities.Similarity;import org.apache.lucene.search.Explanation;public class MyRawTFSimilarity extends Similarity { @Override public long computeNorm(FieldInvertState state) { // Disable length normalization return 1L; } @Override public SimScorer scorer(float boost, CollectionStatistics collectionStats, TermStatistics... termStats) { return new SimScorer() { @Override public float score(float freq, long norm) { return boost * freq; // raw term frequency } @Override public Explanation explain(Explanation freq, long norm) { return Explanation.match(score(freq.getValue().floatValue(), norm), "raw TF score"); } }; }}searcher.setSimilarity(new MyRawTFSimilarity());