Search and Query Wikipedia Change Events in OpenSearch

Once OpenKnowledgeStream is running, every Wikipedia change event flowing through the recent_change_stream Kafka topic is indexed into an OpenSearch index named wiki-changes. OpenSearch exposes a powerful REST API and Query DSL that let you search, filter, aggregate, and analyze that data in real time. This guide covers the structure of the indexed documents and the most useful queries to get started.

Document structure

OpensearchIndexer indexes each Change object directly, using the page title as the document ID (id(change.getTitle())). The Change model (in wiki-common) is a flat, four-field class:

Field	JSON key	Type	Description
`type`	`type`	`string`	Change type: `edit`, `new`, or `log`
`title`	`title`	`string`	Page title as it appears on Wikipedia
`pageId`	`pageid`	`number`	Wikipedia’s numeric page identifier
`tags`	`tags`	`string[]`	Editor-supplied tags, e.g. `"mobile edit"`

A typical document in the index looks like this:

{
  "type": "edit",
  "title": "Albert Einstein",
  "pageid": 736,
  "tags": ["mobile edit", "mobile web edit"]
}

Because the document ID is set to the page title, indexing the same title a second time upserts (overwrites) the existing document rather than creating a duplicate. The wiki-changes index therefore holds at most one document per Wikipedia page title — always reflecting the most recently indexed change for that page.

Common queries

Check index health and document count

GET /wiki-changes/_count

Sample response:

{
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "count": 1503
}

Get the most recently indexed documents

GET /wiki-changes/_search?size=10&sort=_id:desc

This returns the 10 documents whose page-title IDs sort last alphabetically. For strict recency ordering, consider adding an ingestion timestamp field and sorting on that instead.

Search by title keyword

GET /wiki-changes/_search

{
  "query": {
    "match": {
      "title": "Albert Einstein"
    }
  }
}

match performs full-text analysis — it tokenizes the query string and scores results by relevance. Use match_phrase to require the exact phrase in order.

Filter by change type

GET /wiki-changes/_search

{
  "query": {
    "term": {
      "type": "new"
    }
  }
}

Valid values for type are edit (an existing page was modified), new (a page was created), and log (an administrative log entry). Use term rather than match here because type values are not analyzed text — they are exact keyword tokens.

Filter by tag

GET /wiki-changes/_search

{
  "query": {
    "terms": {
      "tags": ["mobile edit", "mobile web edit"]
    }
  }
}

terms is the multi-value equivalent of term — it returns documents where the tags array contains any of the provided values.

Combine filters — new pages tagged as mobile edits

GET /wiki-changes/_search

{
  "query": {
    "bool": {
      "must": [
        { "term": { "type": "new" } }
      ],
      "filter": [
        { "terms": { "tags": ["mobile edit"] } }
      ]
    }
  }
}

Queries inside filter are not scored, making them faster and cacheable — prefer filter over must for exact-match criteria that don’t affect relevance ranking.

Inspect the index mapping

OpenSearch infers the mapping from the first documents it receives. To see what was auto-detected:

GET /wiki-changes/_mapping

Sample response showing the inferred types:

{
  "wiki-changes": {
    "mappings": {
      "properties": {
        "pageid": {
          "type": "long"
        },
        "tags": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "title": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "type": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}

For term and terms queries on type, title, or tags, target the .keyword sub-field to avoid analyzed tokenization:

{
  "query": {
    "term": {
      "type.keyword": "edit"
    }
  }
}

The examples above cover the most common access patterns. For aggregations (e.g., a histogram of change types, top-edited pages, or tag frequency counts), pagination with search_after, and custom index mappings, refer to the OpenSearch Query DSL documentation.

Get Started

Architecture

Configuration

Guides

Search and Query Wikipedia Change Events in OpenSearch

Document structure

Common queries

Check index health and document count

Get the most recently indexed documents

Search by title keyword

Filter by change type

Filter by tag

Combine filters — new pages tagged as mobile edits

Inspect the index mapping

Build docs developers (and LLMs) love

Get Started

Architecture

Configuration

Guides

Documentation Index

​Document structure

​Common queries

​Check index health and document count

​Get the most recently indexed documents

​Search by title keyword

​Filter by change type

​Filter by tag

​Combine filters — new pages tagged as mobile edits

​Inspect the index mapping

Build docs developers (and LLMs) love

Document structure

Common queries

Check index health and document count

Get the most recently indexed documents

Search by title keyword

Filter by change type

Filter by tag

Combine filters — new pages tagged as mobile edits

Inspect the index mapping