Search on docs.github.com is powered by Elasticsearch. When a user types a query, the server calls Elasticsearch and returns ranked results. This page explains how the search index is built, how to run the pipeline locally, and how the search API works.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/github/docs/llms.txt
Use this file to discover all available pages before exploring further.
Search types
The site supports two search modes:General search
Returns docs pages matching the query, sorted by popularity. Served from the
/api/search/v1 endpoint. Example: query clone returns URLs to docs pages about cloning repositories.AI search autocomplete
Returns human-readable full-sentence questions that best match the query. Based on previous searches and popular pages. Served from the
/api/search/ai-search-autocomplete/v1 endpoint. Example: query How do I clone returns How do I clone a repository?- VERSION: a numbered GHES version (e.g.
3.12),ghec, ordotcom - LANGUAGE: one of
es,ja,pt,zh,ru,fr,ko,de - QUERY: any alphanumeric string
Architecture
Elasticsearch stores pre-built indexes that the server queries at runtime. Indexes are populated through a two-step pipeline:- Scrape — fetch each page’s content via the Article API and write structured JSON records to disk
- Index — upload those JSON records into Elasticsearch
/api/article?pathname=<path>) on a locally running server for each indexable page. Each record includes title, intro, breadcrumbs, headings, content (plain text, not HTML), and a unique objectID (the page permalink).
The
objectID is set explicitly to the page permalink. This guarantees that subsequent indexing runs overwrite existing records rather than creating duplicates.Environment configuration
| Variable | Description |
|---|---|
ELASTICSEARCH_URL | URL of the Elasticsearch cluster. Required for search tests and manual indexing. Example: http://localhost:9200/ |
.env file for local development:
Running the pipeline manually
General search
Run the scrape and index steps separately, or together using the combined command.Start the scrape server
The scrape server is a production-mode instance of the docs app running on port 4002 with minimal rendering enabled:This sets
MINIMAL_RENDER=true and CHANGELOG_DISABLED=true to reduce memory usage during scraping.Scrape page content
In a separate terminal, run the scrape script against the running server:To scrape a specific language and version only:The script writes one JSON file per page into the target directory.
AI search autocomplete
AI search autocomplete data comes from an internal data repository, not from scraping. Clonegithub/docs-internal-data to the root of the docs directory, then index:
Text analysis
To analyze how Elasticsearch processes text (useful for debugging relevance issues):Running search tests
Search tests require a running Elasticsearch instance:ELASTICSEARCH_URL=http://localhost:9200/ automatically via the test script.
Language tests that involve search also need the variable:
Production workflow
In production, search indexes are rebuilt automatically by GitHub Actions:| Workflow | Schedule | Scope |
|---|---|---|
index-general-search.yml | Every 4 hours | All versions and languages |
index-autocomplete-search.yml | Daily | AI autocomplete data |
main, trigger index-general-search.yml with a specific version and language to reduce run time (a single version/language takes 5–10 minutes versus ~40 minutes for all).
Key files
| Path | Description |
|---|---|
src/search/components/Search.tsx | Browser-side search input component |
src/search/components/SearchResults.tsx | Browser-side search results rendering |
src/search/middleware/general-search-middleware.ts | Server-side entrypoint for /search page |
src/search/middleware/search-routes/ | API route handlers for search endpoints |
src/search/scripts/scrape/ | Scrape scripts and lib/build-records-from-api.ts |
src/search/scripts/index/ | Indexing scripts for general search and autocomplete |
src/search/scripts/analyze-text.ts | Text analysis utility |
src/search/tests/ | Search tests (require ELASTICSEARCH_URL) |
Search features
- Typo tolerance — Elasticsearch returns results even for misspelled queries.
- Advanced query syntax — Supports exact matching with quotes (
"exact phrase") and term exclusion with a minus sign (-excluded). Enabled in the browser client. - Multilingual — Indexes exist for each supported language. Search respects the language of the current docs URL.
- Weighted attributes — Title is ranked higher than body content.
- Version-scoped — Each query targets the index for the requested GitHub product version.
There is a lag of up to 4 hours between content changes merging to
main and those changes appearing in search results, due to the indexing schedule.