Search Service: parallel full-text query execution

The search service exposes the query interface of the GuancheData platform. It joins the Hazelcast cluster as a full member (not a lightweight client), giving it direct local access to index data. Incoming queries are tokenized, fanned out across all query terms in parallel using CompletableFuture, and then intersected to enforce AND semantics. Results are enriched with book metadata, filtered by optional author, language, and year criteria, and finally sorted before being returned as JSON.

Query execution flow

Parse the HTTP request

SearchRequestMapper reads the q query parameter (required) and the optional author, language, and year parameters from the GET /search request, producing a SearchCriteria object. A missing or blank q returns HTTP 400.

Tokenize the query

ContentSearchEngine lowercases the query string, strips all non-alphanumeric characters, and splits on whitespace to produce an array of terms.

Fan out parallel index lookups

One CompletableFuture is submitted to a fixed thread pool (sized to availableProcessors - 3) for each query term. Each future calls IndexStore.getDocuments(term) against the "inverted-index" IMap and parses the docId:frequency entries it receives.

Aggregate and intersect

As futures complete, per-document frequencies are summed into a ConcurrentHashMap. A second map tracks how many query terms matched each document. After CompletableFuture.allOf() returns, documents that did not match every query term are removed, enforcing AND semantics.

Apply metadata filters

FindBooks fetches BookMetadata for all surviving document IDs from the "bookMetadata" IMap, then discards any document where author, language, or year does not match the filter values supplied in the request.

Sort and return

The remaining SearchResult list is sorted by the configured SortingStrategy and returned to SearchResponsePresenter, which formats the final JSON response.

Sorting strategies

The active sorting strategy is selected at startup from the SORTING_CRITERIA environment variable (default frequency).

Value	Strategy	Behavior
`frequency`	`SortByFrequency`	Sorts results descending by summed term frequency across all query terms. Books containing query terms more often appear first.
`id`	`SortById`	Sorts results ascending by Gutenberg book ID.

If SORTING_CRITERIA is set to an unrecognised value, SortByFrequency is used as the fallback.

Near-cache

The "inverted-index" IMap is configured with a near-cache named "inverted-index-near-cache" with invalidateOnChange: true. Frequently accessed index entries are served from a local in-process cache, avoiding a network hop to the partition owner. When an indexer node updates a term’s entry in the distributed map, the near-cache entry on all search nodes is automatically invalidated, so reads never return stale data.

Response shape

A successful search response has the following structure:

{
  "status": "success",
  "query": "white whale",
  "filters": {
    "author": "Melville",
    "language": "en",
    "year": 1851
  },
  "count": 2,
  "results": [
    {
      "id": 2489,
      "title": "Moby-Dick; or, The Whale",
      "author": "Melville, Herman",
      "language": "en",
      "year": 1851,
      "frequency": 312
    },
    {
      "id": 9147,
      "title": "White Jacket; or, the World in a Man-of-War",
      "author": "Melville, Herman",
      "language": "en",
      "year": 1851,
      "frequency": 47
    }
  ]
}

frequency is the sum of per-term frequencies across all terms in the query for that document. filters only contains the keys that were supplied in the request. On error the response is {"status": "error", "message": "..."}.

HTTP endpoints

Method	Path	Description
`GET`	`/search?q=...`	Full-text search. Optional params: `author`, `language`, `year`.
`GET`	`/health`	Returns `{"status": "healthy", "service": "execute"}`.

All responses are JSON. The service listens on port 7003.

Overview

Getting Started

Services

Operations

Search Service: parallel full-text query execution

Query execution flow

Sorting strategies

Near-cache

Response shape

HTTP endpoints

Build docs developers (and LLMs) love

Overview

Getting Started

Services

Operations

Documentation Index

​Query execution flow

​Sorting strategies

​Near-cache

​Response shape

​HTTP endpoints

Build docs developers (and LLMs) love

Query execution flow

Sorting strategies

Near-cache

Response shape

HTTP endpoints