Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/amitsaxena098/OpenKnowledgeStream/llms.txt

Use this file to discover all available pages before exploring further.

This guide walks you through cloning OpenKnowledgeStream, building all three Maven modules, and launching the pipeline so that Wikipedia Recent Changes are streamed through Kafka and indexed into OpenSearch in real time. The entire process takes less than five minutes once the prerequisites are in place.
1

Confirm prerequisites

Make sure the following services and tools are installed and running before you proceed.Java 21+
java -version
# Expected: openjdk version "21.x.x" or later
Apache Kafka must be running with a broker accessible at localhost:9092. If you are using a local Kafka installation, start ZooKeeper and the broker:
# Start ZooKeeper (if not already running)
bin/zookeeper-server-start.sh config/zookeeper.properties

# Start Kafka broker
bin/kafka-server-start.sh config/server.properties
Then create the topic that the pipeline uses:
bin/kafka-topics.sh --create \
  --topic recent_change_stream \
  --bootstrap-server localhost:9092 \
  --partitions 1 \
  --replication-factor 1
OpenSearch must be running with a node accessible at localhost:9200:
curl -s http://localhost:9200 | grep "cluster_name"
# Expected: "cluster_name" : "..."
OpenKnowledgeStream connects to both Kafka (localhost:9092) and OpenSearch (localhost:9200) immediately on startup. If either service is unavailable the application will fail to initialize.
2

Clone the repository

Clone the OpenKnowledgeStream source from GitHub:
git clone https://github.com/amitsaxena098/OpenKnowledgeStream.git
cd OpenKnowledgeStream
The repository root contains the parent pom.xml and the three module directories:
OpenKnowledgeStream/
├── pom.xml                    # Parent POM (Spring Boot 4.1.0, Java 21)
├── wiki-common/               # Shared Change / Query models
├── wiki-change-stream/        # Wikipedia → Kafka producer
└── opensearch-wiki-indexer/   # Kafka → OpenSearch consumer
3

Build the project

Build all modules from the repository root using the included Maven Wrapper (or your local mvn installation):
# Using the Maven Wrapper (recommended — no local Maven install required)
./mvnw clean install

# Or, if you have Maven installed locally
mvn clean install
Maven resolves the inter-module dependencies in the correct order (wiki-commonopensearch-wiki-indexerwiki-change-stream) and produces a fat JAR for each executable module under its target/ directory.A successful build ends with output similar to:
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for OpenKnowledgeStream 0.0.1-SNAPSHOT:
[INFO]
[INFO] OpenKnowledgeStream ............................... SUCCESS
[INFO] opensearch-wiki-indexer .......................... SUCCESS
[INFO] wiki-change-stream ............................... SUCCESS
[INFO] wiki-common ...................................... SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
Run ./mvnw clean install -DskipTests to skip the test phase and speed up the build during initial setup.
4

Start the application

Launch the wiki-change-stream fat JAR. Because wiki-change-stream declares opensearch-wiki-indexer as a compile-scope dependency, the single JAR contains both the Kafka producer and the Kafka consumer + OpenSearch indexer — you only need to start one process:
java -jar wiki-change-stream/target/wiki-change-stream-0.0.1-SNAPSHOT.jar
On startup, Spring Boot will:
  1. Wire the WikipediaClient with a WebClient pointed at the Wikipedia Recent Changes API.
  2. Register the OpenStream scheduled task (fires every 5 seconds) to poll Wikipedia and publish Change events to the recent_change_stream Kafka topic.
  3. Register the KafkaConsume scheduled task (also fires every 5 seconds) to poll the same topic and forward each Change to OpensearchIndexer for indexing into the wiki-changes index.
You should see log output similar to the following within the first few seconds:
INFO  WikiChangeStream.publish.KafkaPublish  - Change published with title: Python (programming language)
INFO  WikiIndexer.index.OpensearchIndexer    - Indexed Title: Python (programming language)
INFO  WikiChangeStream.publish.KafkaPublish  - Change published with title: Eiffel Tower
INFO  WikiIndexer.index.OpensearchIndexer    - Indexed Title: Eiffel Tower
The application polls Wikipedia every 5 seconds and publishes up to 100 changes per poll. If Wikipedia returns HTTP 429 (rate limit), the producer logs a warning and automatically sleeps for 5 seconds before the next scheduled tick resumes — no manual intervention is needed.
5

Verify documents in OpenSearch

Query the wiki-changes index to confirm that Wikipedia change documents are being indexed:
curl -X GET "http://localhost:9200/wiki-changes/_search?pretty"
A successful response looks like:
{
  "hits": {
    "total": {
      "value": 42,
      "relation": "eq"
    },
    "hits": [
      {
        "_index": "wiki-changes",
        "_id": "Python (programming language)",
        "_source": {
          "type": "edit",
          "title": "Python (programming language)",
          "pageid": 23862,
          "tags": ["mobile edit", "mobile web edit"]
        }
      },
      {
        "_index": "wiki-changes",
        "_id": "Eiffel Tower",
        "_source": {
          "type": "edit",
          "title": "Eiffel Tower",
          "pageid": 9232,
          "tags": []
        }
      }
    ]
  }
}
Each document’s _id is the Wikipedia page title, so repeated edits to the same article upsert the existing document rather than creating duplicates.To check how many unique pages have been indexed:
curl -X GET "http://localhost:9200/wiki-changes/_count?pretty"
You can also browse the index using the OpenSearch Dashboards UI (typically available at http://localhost:5601) by creating an index pattern for wiki-changes.

Build docs developers (and LLMs) love