Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/amitsaxena098/OpenKnowledgeStream/llms.txt

Use this file to discover all available pages before exploring further.

OpenKnowledgeStream’s runtime behaviour is controlled by three configuration surfaces: a minimal application.properties file that names the Spring application, an AppConfig bean that constructs the WebClient used to call the Wikipedia API, and @Scheduled annotations in OpenStream and KafkaConsume that govern how often events are polled and forwarded.

application.properties

The wiki-change-stream module ships a single-line properties file.
application.properties
spring.application.name=OpenKnowledgeStream
spring.application.name
string
default:"OpenKnowledgeStream"
The logical name of the Spring Boot application. Used in log output and Spring Boot banner output at startup.

Wikipedia API configuration

The AppConfig class in wiki-change-stream declares a WebClient Spring bean pre-configured with the Wikipedia Recent Changes API endpoint and all required query parameters.
AppConfig.java
package WikiChangeStream.config;

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.reactive.function.client.WebClient;

@Configuration
public class AppConfig {

    @Bean
    public WebClient webClient() {
        return WebClient.builder()
                .baseUrl("https://en.wikipedia.org/w/api.php?action=query&list=recentchanges&format=json&rclimit=100&rcprop=title|tags|ids")
                .defaultHeader("accept", "application/json")
                .build();
    }
}

Query parameters

action
string
default:"query"
The MediaWiki API action to invoke. query retrieves data from the wiki and is the entry point for list-based requests such as recentchanges.
list
string
default:"recentchanges"
The specific list generator to use within a query action. recentchanges returns the most recent edits, moves, protections, and other change events recorded in the wiki’s recentchanges table.
format
string
default:"json"
The response format requested from the API. json instructs MediaWiki to return a JSON body, which is what the WebClient and downstream Jackson deserialization expect.
rclimit
integer
default:"100"
The maximum number of recent-change entries to return per request. The valid range for unprivileged callers is 1–500. The current value of 100 means each poll fetches up to 100 change events.
rcprop
string
default:"title|tags|ids"
A pipe-separated list of properties to include in each change entry. The three values used are:
ValueDescription
titleThe title of the affected page — used as the document ID in OpenSearch.
tagsAny change tags applied to the edit (e.g., mobile edit, possible vandalism).
idsThe rcid, revid, and old_revid identifiers for the change.

Default request header

accept
string
default:"application/json"
Sent with every request via defaultHeader("accept", "application/json"). Signals to the server that the client expects a JSON response body.

Polling interval

Both the producer and consumer use Spring’s @Scheduled(fixedRate = ...) to run on a fixed cadence. The rate is expressed in milliseconds.
OpenStream.stream() fixedRate
integer
default:"5000"
Defined in wiki-change-stream/src/main/java/WikiChangeStream/service/OpenStream.java. The stream() method is invoked every 5 000 ms (5 seconds). Each invocation calls WikipediaClient.getRecentChanges() and publishes every returned Change to Kafka.
KafkaConsume.consume() fixedRate
integer
default:"5000"
Defined in opensearch-wiki-indexer/src/main/java/WikiIndexer/consumer/KafkaConsume.java. The consume() method is invoked every 5 000 ms (5 seconds). Each invocation calls consumer.poll(Duration.ofMillis(1000)) to drain available records and forwards them to OpensearchIndexer.
OpenStream.java — scheduled producer
@Scheduled(fixedRate = 5000)
private void stream() {
    try {
        Query query = wikipediaClient.getRecentChanges();
        for(Change change : query.getQuery().getRecentChanges()) {
            kafkaPublish.publish(change);
        }
    } catch (TooManyRequests ex) {
        log.warn("Request limit hit...sleeping for 5 seconds...");
        try {
            Thread.sleep(5000);
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
    } catch (Exception ex) {
        log.error("Exception occurred: {}", ex.getMessage());
    }
}
KafkaConsume.java — scheduled consumer
@Scheduled(fixedRate = 5000)
public void consume() throws Exception {
    log.info("Starting kafka consumer....");

    ConsumerRecords<String, Change> record = consumer.poll(Duration.ofMillis(1000));
    for(ConsumerRecord<String, Change> change : record) {
        opensearchIndexer.index(change.value());
    }
}

Changing the polling rate

To adjust how frequently the pipeline polls Wikipedia or drains Kafka, update the fixedRate value (in milliseconds) in the relevant @Scheduled annotation.
1

Open the target file

For the producer, open wiki-change-stream/src/main/java/WikiChangeStream/service/OpenStream.java.
For the consumer, open opensearch-wiki-indexer/src/main/java/WikiIndexer/consumer/KafkaConsume.java.
2

Update fixedRate

Change the fixedRate value to your desired interval in milliseconds. For example, to poll every 10 seconds:
@Scheduled(fixedRate = 10000)
3

Rebuild and restart

Rebuild the affected module with Maven and restart the corresponding Spring Boot application for the change to take effect.
mvn -pl wiki-change-stream spring-boot:run
Setting fixedRate to a very low value (e.g., under 1 000 ms) against the Wikipedia API may trigger rate limiting. OpenStream handles TooManyRequests exceptions by sleeping for 5 seconds before resuming.

Build docs developers (and LLMs) love