Once OpenKnowledgeStream is running, every Wikipedia change event flowing through theDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/amitsaxena098/OpenKnowledgeStream/llms.txt
Use this file to discover all available pages before exploring further.
recent_change_stream Kafka topic is indexed into an OpenSearch index named wiki-changes. OpenSearch exposes a powerful REST API and Query DSL that let you search, filter, aggregate, and analyze that data in real time. This guide covers the structure of the indexed documents and the most useful queries to get started.
Document structure
OpensearchIndexer indexes each Change object directly, using the page title as the document ID (id(change.getTitle())). The Change model (in wiki-common) is a flat, four-field class:
| Field | JSON key | Type | Description |
|---|---|---|---|
type | type | string | Change type: edit, new, or log |
title | title | string | Page title as it appears on Wikipedia |
pageId | pageid | number | Wikipedia’s numeric page identifier |
tags | tags | string[] | Editor-supplied tags, e.g. "mobile edit" |
Because the document ID is set to the page title, indexing the same title a second time upserts (overwrites) the existing document rather than creating a duplicate. The
wiki-changes index therefore holds at most one document per Wikipedia page title — always reflecting the most recently indexed change for that page.Common queries
Check index health and document count
Get the most recently indexed documents
Search by title keyword
match performs full-text analysis — it tokenizes the query string and scores results by relevance. Use match_phrase to require the exact phrase in order.
Filter by change type
type are edit (an existing page was modified), new (a page was created), and log (an administrative log entry). Use term rather than match here because type values are not analyzed text — they are exact keyword tokens.
Filter by tag
terms is the multi-value equivalent of term — it returns documents where the tags array contains any of the provided values.
Combine filters — new pages tagged as mobile edits
filter are not scored, making them faster and cacheable — prefer filter over must for exact-match criteria that don’t affect relevance ranking.
Inspect the index mapping
OpenSearch infers the mapping from the first documents it receives. To see what was auto-detected:term and terms queries on type, title, or tags, target the .keyword sub-field to avoid analyzed tokenization: