OpenKnowledgeStream is a Maven multi-module project. The root POM (Documentation Index
Fetch the complete documentation index at: https://mintlify.com/amitsaxena098/OpenKnowledgeStream/llms.txt
Use this file to discover all available pages before exploring further.
com.as:OpenKnowledgeStream:0.0.1-SNAPSHOT) declares shared dependencies and build configuration that all three child modules inherit. Each child module defines only its module-specific additions. The project targets Java 21 at the root level; the child modules each set maven.compiler.source and maven.compiler.target to 26.
Maven module hierarchy
org.springframework.boot:spring-boot-starter-parent:4.1.0 and declares the following shared dependencies available to all child modules:
wiki-change-stream
Artifact ID:wiki-change-streamGroup ID:
com.asMain class:
WikiChangeStream.OpenKnowledgeStreamApplication
This is the pipeline entry point. It hosts the Wikipedia polling loop and the Kafka producer. Its main class enables scheduling and component-scans all modules so that indexer beans are available in the same application context.
Main class
Key classes
AppConfig
Spring
@Configuration class in package WikiChangeStream.config. Creates the WebClient bean with the Wikipedia API base URL and a default accept: application/json header.WikipediaClient
Spring
@Component in package WikiChangeStream.clients. Wraps the injected WebClient and exposes a single getRecentChanges() method that blocks until the response is fully received. Maps HTTP 429 to a TooManyRequests exception.OpenStream
Spring
@Service in package WikiChangeStream.service. Drives the polling loop via @Scheduled(fixedRate = 5000). Iterates over each Change returned by WikipediaClient and delegates to KafkaPublish. Catches TooManyRequests and sleeps the thread for 5 seconds.KafkaPublish
Spring
@Component in package WikiChangeStream.publish. Constructs a KafkaProducer<String, Change> in its constructor and exposes publish(Change). Sends each change to the recent_change_stream topic using JsonSerializer for the value. A Callback logs the page title on success or the exception message on failure.Module-specific Maven dependencies
spring-boot-starter-webflux is declared here and not in the root POM because only wiki-change-stream uses the reactive WebClient. The two other modules do not require a Netty-based reactive runtime.opensearch-wiki-indexer
Artifact ID:opensearch-wiki-indexerGroup ID:
com.asMain class:
WikiIndexer.WikiIndexerStreamApplication
This module is the pipeline sink. It subscribes to the Kafka topic and writes each record into OpenSearch. It can be used either as a standalone Spring Boot application or as a library module component-scanned by wiki-change-stream.
Main class
Key classes
OpensearchConfig
Spring
@Configuration class in package WikiIndexer.config. Constructs the OpenSearchClient bean using a plain Apache RestClient pointed at localhost:9200 and a JacksonJsonpMapper for serialization.KafkaConsume
Spring
@Component in package WikiIndexer.consumer. Constructs a KafkaConsumer<String, Change> in its constructor, subscribes to recent_change_stream, and polls every 5 seconds via @Scheduled(fixedRate = 5000). Uses auto.offset.reset=earliest and trusts all packages for JSON deserialization.OpensearchIndexer
Spring
@Service in package WikiIndexer.index. Accepts a Change and upserts it into the wiki-changes OpenSearch index using the page title as the document ID.Module-specific Maven dependencies
wiki-common
Artifact ID:wiki-commonGroup ID:
com.asMain class: none — packaged as a plain JAR This is a pure library module. The Spring Boot Maven plugin is configured with
<skip>true</skip> so it is not repackaged as an executable fat JAR. It contains three Lombok @Data model classes that mirror the Wikipedia Recent Changes API JSON structure.
Key classes
Change
Root document model. Represents a single Wikipedia page change event. The
pageId field is mapped from the JSON key pageid via @JsonProperty.RecentChanges
Wrapper around the list of changes. Maps the JSON key
recentchanges to a List<Change> via @JsonProperty.Query
Top-level deserialization target for the Wikipedia API response. Contains a single
RecentChanges field named query, matching the outer query key in the API JSON envelope.