Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/amitsaxena098/OpenKnowledgeStream/llms.txt

Use this file to discover all available pages before exploring further.

OpenKnowledgeStream is structured as a Maven multi-module project with the root artifact com.as:OpenKnowledgeStream (version 0.0.1-SNAPSHOT, Spring Boot 4.1.0, Java 21). It is composed of three modules, each with a distinct responsibility: polling the Wikipedia API, indexing events into OpenSearch, and providing the shared data model. The two runnable modules are packaged as independent Spring Boot applications, both annotated with @EnableScheduling to drive their respective timed polling loops.

wiki-change-stream

Spring Boot application and pipeline entry point. The OpenStream service fires every 5 seconds via @Scheduled(fixedRate = 5000), calling WikipediaClient — a reactive Spring WebFlux WebClient — to fetch the latest 100 recent changes from the Wikipedia API at https://en.wikipedia.org/w/api.php?action=query&list=recentchanges&format=json&rclimit=100&rcprop=title|tags|ids. Each Change object is then published individually to the Kafka topic recent_change_stream via KafkaPublish.

opensearch-wiki-indexer

Spring Boot application and pipeline sink. KafkaConsume polls the recent_change_stream topic every 5 seconds using a plain KafkaConsumer in the wiki-indexer consumer group. For each consumed Change, OpensearchIndexer.index() upserts the document into the wiki-changes OpenSearch index, using the page title as the document ID.

wiki-common

Shared library with no runnable main class. Provides the Change, Query, and RecentChanges Lombok @Data models used by both wiki-change-stream and opensearch-wiki-indexer. Packaged as a plain JAR (Spring Boot Maven plugin is skipped) and referenced as a local dependency by the other two modules.

Module dependency graph

The inter-module dependencies declared in the child pom.xml files are:
  • wiki-change-stream depends on both opensearch-wiki-indexer and wiki-common — it pulls in the indexer module so that OpensearchIndexer and KafkaConsume are available on the classpath and component-scanned at runtime.
  • opensearch-wiki-indexer depends on wiki-common — it references the shared Change model for deserialization and indexing.
  • wiki-common has no intra-project dependencies — it is the base of the dependency tree.
OpenKnowledgeStream (root pom, packaging=pom)
├── wiki-change-stream          ──depends on──► opensearch-wiki-indexer
│                               ──depends on──► wiki-common
├── opensearch-wiki-indexer     ──depends on──► wiki-common
└── wiki-common                 (no intra-project dependencies)
Although opensearch-wiki-indexer is declared as a dependency of wiki-change-stream, the indexer beans (KafkaConsume, OpensearchIndexer) are component-scanned by OpenKnowledgeStreamApplication via @ComponentScan({"com.as", "WikiIndexer", "WikiIndexer.models", "Wikicommon", "WikiChangeStream"}). Both modules therefore run inside a single JVM when wiki-change-stream is launched.

Shared infrastructure

Both runnable modules connect to the same local infrastructure at their default ports:
ServiceAddressUsed by
Apache Kafkalocalhost:9092KafkaPublish, KafkaConsume
OpenSearchlocalhost:9200OpensearchIndexer
Wikipedia REST APIhttps://en.wikipedia.orgWikipediaClient

Build docs developers (and LLMs) love