OpenKnowledgeStream’s runtime behaviour is controlled by three configuration surfaces: a minimalDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/amitsaxena098/OpenKnowledgeStream/llms.txt
Use this file to discover all available pages before exploring further.
application.properties file that names the Spring application, an AppConfig bean that constructs the WebClient used to call the Wikipedia API, and @Scheduled annotations in OpenStream and KafkaConsume that govern how often events are polled and forwarded.
application.properties
Thewiki-change-stream module ships a single-line properties file.
application.properties
The logical name of the Spring Boot application. Used in log output and Spring Boot banner output at startup.
Wikipedia API configuration
TheAppConfig class in wiki-change-stream declares a WebClient Spring bean pre-configured with the Wikipedia Recent Changes API endpoint and all required query parameters.
AppConfig.java
Query parameters
The MediaWiki API action to invoke.
query retrieves data from the wiki and is the entry point for list-based requests such as recentchanges.The specific list generator to use within a
query action. recentchanges returns the most recent edits, moves, protections, and other change events recorded in the wiki’s recentchanges table.The response format requested from the API.
json instructs MediaWiki to return a JSON body, which is what the WebClient and downstream Jackson deserialization expect.The maximum number of recent-change entries to return per request. The valid range for unprivileged callers is 1–500. The current value of
100 means each poll fetches up to 100 change events.A pipe-separated list of properties to include in each change entry. The three values used are:
| Value | Description |
|---|---|
title | The title of the affected page — used as the document ID in OpenSearch. |
tags | Any change tags applied to the edit (e.g., mobile edit, possible vandalism). |
ids | The rcid, revid, and old_revid identifiers for the change. |
Default request header
Sent with every request via
defaultHeader("accept", "application/json"). Signals to the server that the client expects a JSON response body.Polling interval
Both the producer and consumer use Spring’s@Scheduled(fixedRate = ...) to run on a fixed cadence. The rate is expressed in milliseconds.
Defined in
wiki-change-stream/src/main/java/WikiChangeStream/service/OpenStream.java. The stream() method is invoked every 5 000 ms (5 seconds). Each invocation calls WikipediaClient.getRecentChanges() and publishes every returned Change to Kafka.Defined in
opensearch-wiki-indexer/src/main/java/WikiIndexer/consumer/KafkaConsume.java. The consume() method is invoked every 5 000 ms (5 seconds). Each invocation calls consumer.poll(Duration.ofMillis(1000)) to drain available records and forwards them to OpensearchIndexer.OpenStream.java — scheduled producer
KafkaConsume.java — scheduled consumer
Changing the polling rate
To adjust how frequently the pipeline polls Wikipedia or drains Kafka, update thefixedRate value (in milliseconds) in the relevant @Scheduled annotation.
Open the target file
For the producer, open
For the consumer, open
wiki-change-stream/src/main/java/WikiChangeStream/service/OpenStream.java.For the consumer, open
opensearch-wiki-indexer/src/main/java/WikiIndexer/consumer/KafkaConsume.java.Update fixedRate
Change the
fixedRate value to your desired interval in milliseconds. For example, to poll every 10 seconds:Setting
fixedRate to a very low value (e.g., under 1 000 ms) against the Wikipedia API may trigger rate limiting. OpenStream handles TooManyRequests exceptions by sleeping for 5 seconds before resuming.