Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/GuancheData/stage_3/llms.txt

Use this file to discover all available pages before exploring further.

All runtime configuration for GuancheData is supplied through environment variables defined in docker-compose.yml. There is no external configuration file to edit beyond the Compose file itself and nginx.conf. Each service reads its variables at container startup, so changes require a container restart. The sections below document every variable for each service, including defaults, accepted values, and the effect of tuning key parameters.
The ingestion service crawls documents and writes them to a replicated datalake. It also publishes indexing events to ActiveMQ and participates in the Hazelcast cluster.
ingestion-service:
  environment:
    HZ_PORT: "5701"
    HZ_PUBLIC_ADDRESS: xxx:5701
    HZ_MEMBERS: xxx:5701
    HAZELCAST_CLUSTER_NAME: SearchEngine
    BROKER_URL: tcp://xxx:61616
    REPLICATION_FACTOR: 2
    INDEXING_BUFFER_FACTOR: 2
env.HZ_PORT
string
required
Hazelcast member port this container listens on. Must match the port in HZ_PUBLIC_ADDRESS and be exposed via Docker’s ports mapping. Typical value: 5701.
env.HZ_PUBLIC_ADDRESS
string
required
Publicly advertised Hazelcast address in host:port format. Other cluster members use this address to reach this node. Set to the host machine’s LAN IP and the same port as HZ_PORT (e.g. 192.168.1.10:5701).
env.HZ_MEMBERS
string
required
Comma-separated list of seed member addresses for Hazelcast cluster discovery (e.g. 192.168.1.10:5701). All services across all nodes should point to the same seed — typically the ingestion-service on the main node.
env.HAZELCAST_CLUSTER_NAME
string
default:"SearchEngine"
Logical name of the Hazelcast cluster. All services in the cluster must share the same value. Change this only if you need to run isolated clusters on the same network.
env.BROKER_URL
string
default:"tcp://activemq:61616"
ActiveMQ connection URL. Inside a single-node deployment the default tcp://activemq:61616 resolves via the Docker network. In a multi-node deployment, replace activemq with the IP of the node running the broker profile (e.g. tcp://192.168.1.10:61616).
env.REPLICATION_FACTOR
number
default:"1"
Number of filesystem replicas written for each ingested document in the datalake. A value of 2 means each document is stored on two nodes. Higher values improve fault tolerance at the cost of storage and write latency. Must be less than or equal to the number of ingestion-service instances in the cluster.
env.INDEXING_BUFFER_FACTOR
number
default:"10"
Maximum number of datalake entries buffered per indexer before the ingestion service pauses and waits for the indexers to catch up. Lower values reduce memory pressure on indexers under burst load; higher values allow more ingestion parallelism at the cost of increased queue depth.
The indexing service consumes messages from ActiveMQ, reads documents from the shared datalake volume, and builds the distributed in-memory inverted index in Hazelcast.
indexing-service:
  environment:
    HZ_PORT: "5702"
    HZ_PUBLIC_ADDRESS: xxx:5702
    HZ_MEMBERS: xxx:5701
    HAZELCAST_CLUSTER_NAME: SearchEngine
    BROKER_URL: tcp://xxx:61616
env.HZ_PORT
string
required
Hazelcast member port for this service. Must be distinct from other services on the same host. Typical value: 5702.
env.HZ_PUBLIC_ADDRESS
string
required
Publicly advertised Hazelcast address in host:port format. Set to the host machine’s LAN IP combined with HZ_PORT (e.g. 192.168.1.10:5702).
env.HZ_MEMBERS
string
required
Comma-separated seed member addresses for cluster discovery. Should point to the ingestion-service seed on the main node (e.g. 192.168.1.10:5701).
env.HAZELCAST_CLUSTER_NAME
string
default:"SearchEngine"
Must match the cluster name used by all other services. See the ingestion-service entry for details.
env.BROKER_URL
string
default:"tcp://activemq:61616"
ActiveMQ connection URL. Replace the hostname with the broker node’s IP in multi-node deployments (e.g. tcp://192.168.1.10:61616).
The search service exposes an HTTP API that queries the distributed Hazelcast inverted index and returns ranked results. It participates in the Hazelcast cluster as a full peer member and is fronted by the Nginx load balancer.
search-service:
  environment:
    HZ_PORT: "5703"
    SERVICE_PORT: "7003"
    HZ_PUBLIC_ADDRESS: xxx:5703
    HZ_MEMBERS: xxx:5701
    HAZELCAST_CLUSTER_NAME: SearchEngine
    SORTING_CRITERIA: "frequency"
env.HZ_PORT
string
required
Hazelcast member port for this service. Must be distinct from other services on the same host. Typical value: 5703.
env.SERVICE_PORT
string
default:"7003"
HTTP port on which the search API listens inside the container. This must be exposed via Docker’s ports mapping and match the port configured in nginx.conf for the search_backend upstream.
env.HZ_PUBLIC_ADDRESS
string
required
Publicly advertised Hazelcast address in host:port format. Set to the host machine’s LAN IP combined with HZ_PORT (e.g. 192.168.1.10:5703).
env.HZ_MEMBERS
string
required
Comma-separated seed member addresses for cluster discovery. Should point to the ingestion-service seed on the main node (e.g. 192.168.1.10:5701).
env.CLUSTER_NAME
string
default:"SearchEngine"
Logical Hazelcast cluster name. Note that this service uses CLUSTER_NAME rather than HAZELCAST_CLUSTER_NAME. The value must still match the cluster name used by the ingestion and indexing services.
env.SORTING_CRITERIA
string
default:"frequency"
Result ranking strategy applied to search results. Accepted values:
  • frequency — results are sorted by term frequency in descending order (most relevant first).
  • id — results are sorted by document ID in ascending order.

Build docs developers (and LLMs) love