Documentation Index
Fetch the complete documentation index at: https://mintlify.com/amitsaxena098/OpenKnowledgeStream/llms.txt
Use this file to discover all available pages before exploring further.
wiki-change-stream is the Spring Boot module that periodically polls the Wikipedia Recent Changes API and publishes each change event to Kafka. It runs on a fixed 5-second schedule, fetches the latest 100 page edits, and forwards each one as a serialized Change record to the recent_change_stream topic.
Classes
OpenKnowledgeStreamApplication
Package: WikiChangeStream
The application entry point. Bootstraps the Spring context, enables scheduled task execution, and registers component packages for scanning.
OpenKnowledgeStreamApplication.java
| Annotation | Purpose |
|---|---|
@SpringBootApplication | Enables auto-configuration and component scanning |
@EnableScheduling | Activates Spring’s @Scheduled task executor |
@ComponentScan | Registers WikiChangeStream, WikiIndexer, and Wikicommon packages |
OpenStream
Package: WikiChangeStream.serviceAnnotations:
@Service, @Slf4j
The core polling service. Calls WikipediaClient on a fixed 5-second interval and fans each change out to KafkaPublish.
Constructor
stream()
OpenStream.java
When Wikipedia returns HTTP 429,
stream() catches TooManyRequests and pauses the current thread for 5 seconds before the next scheduled invocation resumes normally.WikipediaClient
Package: WikiChangeStream.clientsAnnotations:
@Component, @RequiredArgsConstructor
Wraps the reactive WebClient to fetch recent changes from the Wikipedia API. The base URL is configured by the AppConfig bean.
Field
| Name | Type | Description |
|---|---|---|
webClient | WebClient | Injected Spring bean; base URL pre-configured to the Wikipedia API endpoint |
getRecentChanges()
WikipediaClient.java
Query object. Throws TooManyRequests if the API responds with HTTP 429. The .block() call makes the reactive stream synchronous.
KafkaPublish
Package: WikiChangeStream.publishAnnotations:
@Component, @Slf4j
Creates and manages a KafkaProducer that serializes Change objects to JSON and sends them to the recent_change_stream topic.
KafkaPublish.java
| Property | Value |
|---|---|
bootstrap.servers | localhost:9092 |
key.serializer | StringSerializer |
value.serializer | JsonSerializer (Spring Kafka) |
| Topic | recent_change_stream |
publish(Change recentChange)
Wraps recentChange in a ProducerRecord and sends it asynchronously. The send callback logs the page title on success, or the exception message on failure.
AppConfig
Package: WikiChangeStream.configAnnotation:
@Configuration
Declares the WebClient bean consumed by WikipediaClient.
AppConfig.java
TooManyRequests
Package: WikiChangeStream.exception
A RuntimeException subclass thrown by WikipediaClient when the Wikipedia API responds with HTTP 429 (Too Many Requests).
TooManyRequests.java
OpenStream.stream() catches this exception and backs off for 5 seconds before the next poll attempt.
Maven Artifact
Group ID:com.asArtifact ID:
wiki-change-streamVersion:
0.0.1-SNAPSHOT
pom.xml
wiki-common
Shared data model classes (
Change, Query, RecentChanges)spring-boot-starter-webflux
Reactive HTTP client (
WebClient) for Wikipedia API callsopensearch-wiki-indexer
Downstream consumer of the Kafka topic produced by this module