Update the Unreal Engine API and Guide Knowledge Base

The knowledge base ships pre-built, but you can re-scrape it at any time — to pick up updates to the Epic documentation, to target a new UE version, or to add sections that were not included in the initial crawl. The two crawlers are fully independent: the Python API class reference uses a standard HTTP client (no special requirements), while the conceptual guides require Playwright because the Epic Developer Community portal is a Cloudflare-protected Angular SPA.

Updating the Python API Class Reference

# Scrape the full UE 5.8 Python API (all ~11,600 class pages)
python skills/dbv-unreal-python-api/scripts/scrape_ue_api.py

# Resume an interrupted run without re-downloading completed pages
python skills/dbv-unreal-python-api/scripts/scrape_ue_api.py --resume

# Check current status before running
python skills/dbv-unreal-python-api/scripts/scrape_ue_api.py --status

# Update the index for a new UE version (e.g. 5.9)
python skills/dbv-unreal-python-api/scripts/update_api.py --version 5.9

The scraper discovers all class and module URLs from the live Epic documentation index, downloads each page, converts it to clean Markdown, and stores the result in knowledge/classes/<ClassName>.md. When finished, it writes a master index at knowledge/index.json containing the class name, file path, category, URL, and up to 30 keywords per class. Progress is saved automatically every 25 pages to .progress.json, so a --resume run skips any page that was already completed.

Updating Conceptual Guides (requires Playwright)

Why Playwright?

dev.epicgames.com is an Angular single-page application protected by Cloudflare Bot Management. Standard HTTP clients (urllib, curl, requests) receive a 403 Forbidden response because they lack the TLS fingerprint and JavaScript execution profile of a real browser. The crawler solves this with Playwright (real Chromium headless). There is an additional constraint: Cloudflare degrades the trust score of a browser session after its first internal API call, so a new browser context must be opened for every page. Reusing a context across multiple pages causes 403 errors from the second request onward. Page discovery does not rely on hardcoded URL lists. The crawler fetches the complete documentation tree from the table_of_content.json endpoint (thousands of entries in a single call) and derives all page URLs from it automatically.

Install Playwright (one time)

pip install playwright
playwright install chromium

Download commands

# Download ALL documentation (~3,600 pages, resumable)
python skills/dbv-unreal-python-api/scripts/scrape_ue_guides.py --category full --resume

# Download a specific section using a named shortcut
python skills/dbv-unreal-python-api/scripts/scrape_ue_guides.py --category blueprints
python skills/dbv-unreal-python-api/scripts/scrape_ue_guides.py --category pcg
python skills/dbv-unreal-python-api/scripts/scrape_ue_guides.py --category materials
python skills/dbv-unreal-python-api/scripts/scrape_ue_guides.py --category editor

# Download a specific subtree using a literal slug from the documentation tree
python skills/dbv-unreal-python-api/scripts/scrape_ue_guides.py --category "node-reference/ControlRig"

# Check the status of the current guides index
python skills/dbv-unreal-python-api/scripts/scrape_ue_guides.py --status

# Merge all shards into the final index after parallel crawls
python skills/dbv-unreal-python-api/scripts/scrape_ue_guides.py --merge

Available named category shortcuts:

Shortcut	Root slug in documentation tree
`pcg`	`procedural-content-generation-framework-in-unreal-engine`
`materials`	`unreal-engine-materials`
`blueprints`	`blueprints-visual-scripting-in-unreal-engine`
`editor`	`scripting-and-automating-the-unreal-editor`

Use all to crawl the union of all four shortcuts, or full to crawl the entire documentation tree (all ~24 top-level sections, excluding the Python API and Blueprint API reference stubs that redirect to separate systems).

Parallel Crawling

Each --category <x> run writes to its own isolated shard files so that multiple processes can run simultaneously without file conflicts:

File	Purpose
`knowledge/index_guides__<x>.json`	Shard index for category `x`
`knowledge/.progress_guides__<x>.json`	Completed slugs for category `x`
`knowledge/.errors_guides__<x>.json`	Failed slugs for category `x` (retried on `--resume`)

Once all parallel processes have finished, merge the shards into the main index_guides.json:

python skills/dbv-unreal-python-api/scripts/scrape_ue_guides.py --merge

The merge command combines all index_guides__*.json files found in the knowledge/ directory into a single index_guides.json, updating the total guide and chunk counts.

Do not run more than ~5–6 parallel crawl processes at the same time. Cloudflare rate-limits sustained high concurrency (not just startup bursts). Exceeding this threshold causes prolonged 403 responses across all running processes, not just the one that triggered the limit.

Failed pages are recorded in .errors_guides__<category>.json and are not marked as completed in the progress file. The next --resume run automatically retries every page listed in the errors file, so no manual intervention is needed after a network failure.

Get Started

Web Client

Unreal Engine Integration

AI Agent Connections

Python API Skill

Update the Unreal Engine API and Guide Knowledge Base

Updating the Python API Class Reference

Updating Conceptual Guides (requires Playwright)

Why Playwright?

Install Playwright (one time)

Download commands

Parallel Crawling

Build docs developers (and LLMs) love

Get Started

Web Client

Unreal Engine Integration

AI Agent Connections

Python API Skill

Documentation Index

​Updating the Python API Class Reference

​Updating Conceptual Guides (requires Playwright)

​Why Playwright?

​Install Playwright (one time)

​Download commands

​Parallel Crawling

Build docs developers (and LLMs) love

Updating the Python API Class Reference

Updating Conceptual Guides (requires Playwright)

Why Playwright?

Install Playwright (one time)

Download commands

Parallel Crawling