Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/GuancheData/stage_3/llms.txt

Use this file to discover all available pages before exploring further.

GuancheData is packaged as a multi-module Maven project. A single mvn clean package command produces all the executable JARs that Docker Compose then picks up when building container images. Deployment is profile-driven: the main node runs the broker, backend services, and load balancer together, while additional nodes join the Hazelcast cluster by running only the backend profile. Follow the steps below to go from a fresh clone to a running cluster.
1

Navigate to the repository root

All Maven and Docker Compose commands must be run from the root of the repository, where the top-level pom.xml and docker-compose.yml live.
cd /path/to/GuancheData/stage_3
2

Build all service JARs

Compile and package every module in one pass from the root:
mvn clean package
This builds four modules — ingestion-service, indexing-service, search-service, and benchmarking — and places the shaded (fat) JARs in each module’s target/ directory. Docker Compose will copy these artifacts into the container images at startup; you do not need to run docker build separately.
3

Configure IP addresses in docker-compose.yml

Every xxx placeholder in docker-compose.yml must be replaced with a real IP address before the cluster starts.
The cluster will fail to form if any xxx placeholder remains. Search for all occurrences before proceeding:
grep -n "xxx" docker-compose.yml
The substitution rules are:
  • In HZ_PUBLIC_ADDRESS — use the host IP of the machine running that service.
  • In HZ_MEMBERS — use the host IP of the main node (the seed node for Hazelcast discovery). All services on all nodes should point here.
  • In BROKER_URL — use the host IP of the node running the broker profile (i.e., the main node).
Example after substitution (main node at 192.168.1.10):
ingestion-service:
  environment:
    HZ_PORT: "5701"
    HZ_PUBLIC_ADDRESS: 192.168.1.10:5701
    HZ_MEMBERS: 192.168.1.10:5701
    HAZELCAST_CLUSTER_NAME: SearchEngine
    BROKER_URL: tcp://192.168.1.10:61616
    REPLICATION_FACTOR: 2
    INDEXING_BUFFER_FACTOR: 2

indexing-service:
  environment:
    HZ_PORT: "5702"
    HZ_PUBLIC_ADDRESS: 192.168.1.10:5702
    HZ_MEMBERS: 192.168.1.10:5701
    HAZELCAST_CLUSTER_NAME: SearchEngine
    BROKER_URL: tcp://192.168.1.10:61616

search-service:
  environment:
    HZ_PORT: "5703"
    SERVICE_PORT: "7003"
    HZ_PUBLIC_ADDRESS: 192.168.1.10:5703
    HZ_MEMBERS: 192.168.1.10:5701
    HAZELCAST_CLUSTER_NAME: SearchEngine
    SORTING_CRITERIA: "frequency"
On each additional node, set HZ_PUBLIC_ADDRESS to that node’s own IP and keep HZ_MEMBERS pointing at the main node’s ingestion-service (<MAIN_NODE_IP>:5701).
4

Configure nginx.conf with search-service backend IPs

The Nginx load balancer proxies /search and /health requests to the search-service instances in the cluster. Before starting the loadbalancer profile, replace the <NODE_IP> placeholders in nginx.conf with the real IPs of every node running a search-service.
upstream search_backend {
    least_conn;

    server 192.168.1.10:7003 max_fails=10 fail_timeout=30s;
    server 192.168.1.11:7003 max_fails=10 fail_timeout=30s;

    keepalive 64;
}
Add one server line per search-service instance. Nginx uses a least-connections strategy and will automatically stop routing to a backend after 10 consecutive failures, with a 30-second recovery window.
For a single-node deployment, keep only one server line pointing at 127.0.0.1:7003 or the host’s LAN IP.
5

Start the cluster

Choose the deployment mode that matches your environment.
Start the broker, all backend services, and the load balancer together on one machine:
docker compose --profile backend --profile broker --profile loadbalancer up -d
This brings up the following containers:
ContainerProfileDescription
ingestion-servicebackendDocument crawler and datalake writer
indexing-servicebackendAsync indexer consuming ActiveMQ messages
search-servicebackendHTTP search API (port 7003)
activemqbrokerActiveMQ message broker (port 61616)
nginxloadbalancerReverse proxy and load balancer (port 8080)

Build docs developers (and LLMs) love