Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/GuancheData/stage_3/llms.txt

Use this file to discover all available pages before exploring further.

GuancheData ships a dedicated benchmark service in docker-compose.yml that measures three core performance dimensions: how fast the system ingests documents, how fast it indexes tokens, and how quickly the cluster recovers after a node failure. A fourth dimension — query latency under concurrent load — is measured separately using Apache JMeter. Each benchmark mode is controlled by a single environment variable and runs as a Docker Compose profile, making it easy to reproduce experiments at different cluster sizes.

Starting the benchmark service

Set BENCHMARK_MODE in docker-compose.yml before starting the benchmark container:
benchmark:
  environment:
    BENCHMARK_MODE: recoverytime   # ingestionrate | indexingthroughput | recoverytime
    HZ_PORT: "5704"
    HZ_PUBLIC_ADDRESS: <NODE_IP>:5704
    HZ_MEMBERS: <SEED_NODE_IP>:5701
    HAZELCAST_CLUSTER_NAME: SearchEngine
Then start the benchmark profile:
docker compose --profile benchmark up -d
View live output with:
docker logs -f benchmark
The benchmark container exits automatically when all iterations are complete (except recoverytime, which runs indefinitely and waits for a node to be removed).

Benchmark modes

Measures how many books per second the ingestion service downloads and stores in the datalake.Mechanism: IngestionRate connects to the cluster as a Hazelcast client (not a full member). It accesses the "log" ISet — a BookDownloadLog that ingestion nodes append to as each book is stored. Every second, it reads the current size of the ISet, waits one second, reads the size again, and divides the difference by the elapsed time to compute a rate in books/second.Iterations: 15 total — 5 warmup iterations (discarded) followed by 10 measured iterations. The warmup period allows the ingestion pipeline to reach steady-state throughput before recording results.Output:
warmup  1: 3.821 books/s (1.0s)
...
Iter  1: 4.102 books/s (1.0s)
...
=== FINAL RESULTS ===
IngestionRate: 4.053 ± 0.087 books/s
(min=3.912, avg=4.053, max=4.201)
What to vary: Run with 1, 2, and 4 ingestion containers to observe horizontal scalability. Note the back-pressure effect of INDEXING_BUFFER_FACTOR — if indexers are slower than ingestion, the measured rate will plateau even as ingestion nodes are added.

Full benchmarking workflow

1

Build and deploy the cluster

From the repository root, build all service JARs and start the full stack on the main node:
mvn clean package
docker compose --profile backend --profile broker --profile loadbalancer up -d
Wait until all containers are healthy and the Hazelcast cluster reports all members joined.
2

Allow the datalake to populate

Let the ingestion service run for several minutes to download and store a representative number of books. A larger datalake produces more stable benchmark results and makes recovery time measurements more meaningful, because there is more partition data to rebalance.
3

Run ingestion rate benchmark

Set BENCHMARK_MODE: ingestionrate in docker-compose.yml and start the benchmark:
docker compose --profile benchmark up -d
docker logs -f benchmark
Record the final IngestionRate mean and standard deviation. Then scale to 2 ingestion nodes by starting the backend profile on a second machine and repeating.
4

Run indexing throughput benchmark

Update BENCHMARK_MODE: indexingthroughput and restart the benchmark container:
docker compose --profile benchmark up --force-recreate -d
docker logs -f benchmark
Each iteration takes 10 seconds; the full run completes in approximately 150 seconds.
5

Run recovery time benchmark

Update BENCHMARK_MODE: recoverytime and restart:
docker compose --profile benchmark up --force-recreate -d
Once the benchmark logs CLUSTER IS SAFE, simulate a failure:
docker stop indexing-service
Read the recovery time from the benchmark logs. Restart the stopped container and wait for the cluster to stabilize before triggering the next failure.
6

Run query latency benchmark with JMeter

With the cluster running and the index populated, open Apache JMeter and load the test plan:
/benchmarks/load-test.jmx
Configure the thread count to simulate the desired number of concurrent users and run the test against the Nginx endpoint (http://<NGINX_IP>:8080). JMeter reports mean latency, percentile latencies (p95, p99), and throughput in requests/second. The /benchmarks directory also contains previous result datasets and logs for comparison.

Build docs developers (and LLMs) love