OpenKnowledgeStream is a multi-module Java pipeline that continuously polls the Wikipedia Recent Changes API, publishes each edit event to a Kafka topic, and indexes the records into OpenSearch. The result is a live, searchable data stream of Wikipedia page edits—ready for dashboards, analytics, and downstream integrations.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/amitsaxena098/OpenKnowledgeStream/llms.txt
Use this file to discover all available pages before exploring further.
Introduction
Understand what OpenKnowledgeStream does and how the components fit together.
Quickstart
Stand up Kafka and OpenSearch, build the project, and see your first change indexed in under 10 minutes.
Architecture
Explore the three-module design: change stream, indexer, and shared common library.
Configuration
Configure Kafka brokers, OpenSearch endpoints, and polling intervals.
Guides
Step-by-step guides for running locally, deploying to production, and querying indexed data.
Reference
Full reference for all components, services, and data models.
How It Works
Poll Wikipedia
The
wiki-change-stream module queries https://en.wikipedia.org/w/api.php every 5 seconds, fetching up to 100 recent page changes including title, page ID, change type, and tags.Publish to Kafka
Each
Change record is serialized as JSON and published to the recent_change_stream Kafka topic on localhost:9092. The producer uses Spring Kafka’s JsonSerializer.Consume and Index
The
opensearch-wiki-indexer module polls the topic every 5 seconds, deserializes each message, and upserts it into the wiki-changes OpenSearch index using the page title as the document ID.OpenKnowledgeStream requires Java 21+, Apache Kafka, and OpenSearch running locally (or remotely with updated connection config). See the Quickstart for setup instructions.