Benchmarking GuancheData Search Engine performance

GuancheData ships a dedicated benchmark service in docker-compose.yml that measures three core performance dimensions: how fast the system ingests documents, how fast it indexes tokens, and how quickly the cluster recovers after a node failure. A fourth dimension — query latency under concurrent load — is measured separately using Apache JMeter. Each benchmark mode is controlled by a single environment variable and runs as a Docker Compose profile, making it easy to reproduce experiments at different cluster sizes.

Starting the benchmark service

Set BENCHMARK_MODE in docker-compose.yml before starting the benchmark container:

benchmark:
  environment:
    BENCHMARK_MODE: recoverytime   # ingestionrate | indexingthroughput | recoverytime
    HZ_PORT: "5704"
    HZ_PUBLIC_ADDRESS: <NODE_IP>:5704
    HZ_MEMBERS: <SEED_NODE_IP>:5701
    HAZELCAST_CLUSTER_NAME: SearchEngine

Then start the benchmark profile:

docker compose --profile benchmark up -d

View live output with:

docker logs -f benchmark

The benchmark container exits automatically when all iterations are complete (except recoverytime, which runs indefinitely and waits for a node to be removed).

Benchmark modes

ingestionrate
indexingthroughput
recoverytime

Measures how many books per second the ingestion service downloads and stores in the datalake.Mechanism: IngestionRate connects to the cluster as a Hazelcast client (not a full member). It accesses the "log" ISet — a BookDownloadLog that ingestion nodes append to as each book is stored. Every second, it reads the current size of the ISet, waits one second, reads the size again, and divides the difference by the elapsed time to compute a rate in books/second.Iterations: 15 total — 5 warmup iterations (discarded) followed by 10 measured iterations. The warmup period allows the ingestion pipeline to reach steady-state throughput before recording results.Output:

warmup  1: 3.821 books/s (1.0s)
...
Iter  1: 4.102 books/s (1.0s)
...
=== FINAL RESULTS ===
IngestionRate: 4.053 ± 0.087 books/s
(min=3.912, avg=4.053, max=4.201)

What to vary: Run with 1, 2, and 4 ingestion containers to observe horizontal scalability. Note the back-pressure effect of INDEXING_BUFFER_FACTOR — if indexers are slower than ingestion, the measured rate will plateau even as ingestion nodes are added.

Measures how many tokens per second the indexing service processes across all active indexing nodes.Mechanism: IndexingThroughput connects as a Hazelcast client and uses two CP atomic longs to coordinate with indexing nodes:

"token_counter_activator" — set to 1 to signal indexers to start counting, 0 to stop.
"token_counter" — indexing nodes increment this each time they process a token, but only when token_counter_activator is 1.

Each iteration resets token_counter to 0, sets token_counter_activator to 1, waits 10 seconds, then reads the accumulated count. The rate is tokens / elapsed_seconds.Iterations: 15 total — 5 warmup followed by 10 measured, each 10 seconds long. Total benchmark runtime is approximately 150 seconds (2.5 minutes).Output:

warmup  1: 18432.5 tokens/s (10.0s) [tokens=184325]
...
Iter  1: 21045.0 tokens/s (10.0s) [tokens=210450]
...
=== FINAL RESULTS ===
IndexingThroughput: 20812.3 ± 412.1 tokens/s
(min=20103.0, avg=20812.3, max=21503.0)

What to vary: Scale the number of indexing service containers to observe near-linear throughput growth. Each additional indexer node joins the Hazelcast cluster and begins consuming from the documents.ingested queue independently.

Measures the wall-clock time from when a cluster member is removed until the cluster considers itself fully safe and rebalanced.Mechanism: RecoveryTime joins the cluster as a full Hazelcast member (not a client) so it can observe real partition events. It waits until the cluster has at least 2 members and isClusterSafe() returns true before arming the measurement. It then registers a MembershipListener.When memberRemoved fires, a CP atomic long ("recovery-measuring-lock") is used as a compare-and-swap mutex to ensure only the first removal event triggers measurement. The benchmark then polls partitionService.isClusterSafe() every 50 milliseconds until it returns true. The elapsed time from memberRemoved to the first isClusterSafe() == true poll is reported as the recovery time in milliseconds.Output:

!!! MEMBER REMOVED: Member [192.168.1.10]:5702 ...
RECOVERY TIME: 1.423 s (1423 ms)

How to trigger a measurement: With the benchmark container running and the cluster healthy, stop one of the backend containers:

docker stop indexing-service

The benchmark will detect the membership event and print the recovery time. The container runs indefinitely and can capture multiple sequential failure events.What to vary: Increase the number of cluster members to observe how partition rebalancing time scales with cluster size and data volume. Larger inverted-index IMaps take longer to rebalance because more backup partitions need to be promoted and new backups created.

Full benchmarking workflow

Build and deploy the cluster

From the repository root, build all service JARs and start the full stack on the main node:

mvn clean package
docker compose --profile backend --profile broker --profile loadbalancer up -d

Wait until all containers are healthy and the Hazelcast cluster reports all members joined.

Allow the datalake to populate

Let the ingestion service run for several minutes to download and store a representative number of books. A larger datalake produces more stable benchmark results and makes recovery time measurements more meaningful, because there is more partition data to rebalance.

Run ingestion rate benchmark

Set BENCHMARK_MODE: ingestionrate in docker-compose.yml and start the benchmark:

docker compose --profile benchmark up -d
docker logs -f benchmark

Record the final IngestionRate mean and standard deviation. Then scale to 2 ingestion nodes by starting the backend profile on a second machine and repeating.

Run indexing throughput benchmark

Update BENCHMARK_MODE: indexingthroughput and restart the benchmark container:

docker compose --profile benchmark up --force-recreate -d
docker logs -f benchmark

Each iteration takes 10 seconds; the full run completes in approximately 150 seconds.

Run recovery time benchmark

Update BENCHMARK_MODE: recoverytime and restart:

docker compose --profile benchmark up --force-recreate -d

Once the benchmark logs CLUSTER IS SAFE, simulate a failure:

docker stop indexing-service

Read the recovery time from the benchmark logs. Restart the stopped container and wait for the cluster to stabilize before triggering the next failure.

Run query latency benchmark with JMeter

With the cluster running and the index populated, open Apache JMeter and load the test plan:

/benchmarks/load-test.jmx

Configure the thread count to simulate the desired number of concurrent users and run the test against the Nginx endpoint (http://<NGINX_IP>:8080). JMeter reports mean latency, percentile latencies (p95, p99), and throughput in requests/second. The /benchmarks directory also contains previous result datasets and logs for comparison.

Overview

Getting Started

Services

Operations

Benchmarking GuancheData Search Engine performance

Starting the benchmark service

Benchmark modes

Full benchmarking workflow

Build docs developers (and LLMs) love

Overview

Getting Started

Services

Operations

Documentation Index

​Starting the benchmark service

​Benchmark modes

​Full benchmarking workflow

Build docs developers (and LLMs) love

Starting the benchmark service

Benchmark modes

Full benchmarking workflow