Configuring GuancheData Search Engine services

ingestion-service

The ingestion service crawls documents and writes them to a replicated datalake. It also publishes indexing events to ActiveMQ and participates in the Hazelcast cluster.

ingestion-service:
  environment:
    HZ_PORT: "5701"
    HZ_PUBLIC_ADDRESS: xxx:5701
    HZ_MEMBERS: xxx:5701
    HAZELCAST_CLUSTER_NAME: SearchEngine
    BROKER_URL: tcp://xxx:61616
    REPLICATION_FACTOR: 2
    INDEXING_BUFFER_FACTOR: 2

env.HZ_PORT

string

required

Hazelcast member port this container listens on. Must match the port in HZ_PUBLIC_ADDRESS and be exposed via Docker’s ports mapping. Typical value: 5701.

env.HZ_PUBLIC_ADDRESS

string

required

Publicly advertised Hazelcast address in host:port format. Other cluster members use this address to reach this node. Set to the host machine’s LAN IP and the same port as HZ_PORT (e.g. 192.168.1.10:5701).

env.HZ_MEMBERS

string

required

Comma-separated list of seed member addresses for Hazelcast cluster discovery (e.g. 192.168.1.10:5701). All services across all nodes should point to the same seed — typically the ingestion-service on the main node.

env.HAZELCAST_CLUSTER_NAME

string

default:"SearchEngine"

Logical name of the Hazelcast cluster. All services in the cluster must share the same value. Change this only if you need to run isolated clusters on the same network.

env.BROKER_URL

string

default:"tcp://activemq:61616"

ActiveMQ connection URL. Inside a single-node deployment the default tcp://activemq:61616 resolves via the Docker network. In a multi-node deployment, replace activemq with the IP of the node running the broker profile (e.g. tcp://192.168.1.10:61616).

env.REPLICATION_FACTOR

number

default:"1"

Number of filesystem replicas written for each ingested document in the datalake. A value of 2 means each document is stored on two nodes. Higher values improve fault tolerance at the cost of storage and write latency. Must be less than or equal to the number of ingestion-service instances in the cluster.

env.INDEXING_BUFFER_FACTOR

number

default:"10"

Maximum number of datalake entries buffered per indexer before the ingestion service pauses and waits for the indexers to catch up. Lower values reduce memory pressure on indexers under burst load; higher values allow more ingestion parallelism at the cost of increased queue depth.

indexing-service

The indexing service consumes messages from ActiveMQ, reads documents from the shared datalake volume, and builds the distributed in-memory inverted index in Hazelcast.

indexing-service:
  environment:
    HZ_PORT: "5702"
    HZ_PUBLIC_ADDRESS: xxx:5702
    HZ_MEMBERS: xxx:5701
    HAZELCAST_CLUSTER_NAME: SearchEngine
    BROKER_URL: tcp://xxx:61616

env.HZ_PORT

string

required

Hazelcast member port for this service. Must be distinct from other services on the same host. Typical value: 5702.

env.HZ_PUBLIC_ADDRESS

string

required

Publicly advertised Hazelcast address in host:port format. Set to the host machine’s LAN IP combined with HZ_PORT (e.g. 192.168.1.10:5702).

env.HZ_MEMBERS

string

required

Comma-separated seed member addresses for cluster discovery. Should point to the ingestion-service seed on the main node (e.g. 192.168.1.10:5701).

env.HAZELCAST_CLUSTER_NAME

string

default:"SearchEngine"

Must match the cluster name used by all other services. See the ingestion-service entry for details.

env.BROKER_URL

string

default:"tcp://activemq:61616"

ActiveMQ connection URL. Replace the hostname with the broker node’s IP in multi-node deployments (e.g. tcp://192.168.1.10:61616).

search-service

The search service exposes an HTTP API that queries the distributed Hazelcast inverted index and returns ranked results. It participates in the Hazelcast cluster as a full peer member and is fronted by the Nginx load balancer.

search-service:
  environment:
    HZ_PORT: "5703"
    SERVICE_PORT: "7003"
    HZ_PUBLIC_ADDRESS: xxx:5703
    HZ_MEMBERS: xxx:5701
    HAZELCAST_CLUSTER_NAME: SearchEngine
    SORTING_CRITERIA: "frequency"

env.HZ_PORT

string

required

Hazelcast member port for this service. Must be distinct from other services on the same host. Typical value: 5703.

env.SERVICE_PORT

string

default:"7003"

HTTP port on which the search API listens inside the container. This must be exposed via Docker’s ports mapping and match the port configured in nginx.conf for the search_backend upstream.

env.HZ_PUBLIC_ADDRESS

string

required

Publicly advertised Hazelcast address in host:port format. Set to the host machine’s LAN IP combined with HZ_PORT (e.g. 192.168.1.10:5703).

env.HZ_MEMBERS

string

required

Comma-separated seed member addresses for cluster discovery. Should point to the ingestion-service seed on the main node (e.g. 192.168.1.10:5701).

env.CLUSTER_NAME

string

default:"SearchEngine"

Logical Hazelcast cluster name. Note that this service uses CLUSTER_NAME rather than HAZELCAST_CLUSTER_NAME. The value must still match the cluster name used by the ingestion and indexing services.

env.SORTING_CRITERIA

string

default:"frequency"

Result ranking strategy applied to search results. Accepted values:

frequency — results are sorted by term frequency in descending order (most relevant first).
id — results are sorted by document ID in ascending order.

Overview

Getting Started

Services

Operations

Configuring GuancheData Search Engine services

Build docs developers (and LLMs) love

Overview

Getting Started

Services

Operations

Documentation Index

Build docs developers (and LLMs) love