Session 9 of Going Meta, broadcast on October 4, 2022, tackles the challenge of building a knowledge graph without any manual labeling. Jesus presents an end-to-end pipeline orchestrated with Prefect that scrapes web content, automatically tags entities using Wikidata lookups, and loads structured facts into Neo4j — all without a human annotator in the loop. The session also introduces graph observability: monitoring the growth, coverage, and health of your KG over time as new content flows in.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/jbarrasa/goingmeta/llms.txt
Use this file to discover all available pages before exploring further.
What You Will Learn
- How to design a Prefect flow that coordinates scraping, entity recognition, and graph loading as discrete tasks
- How Wikidata’s API can serve as a zero-shot entity linker, matching surface forms to canonical URIs
- How to structure scraped content as Neo4j nodes and wire them together with typed relationships
- What graph observability means in practice: tracking KG growth, entity coverage, and schema drift across pipeline runs
- How unsupervised construction compares to supervised NLP pipelines in terms of precision, recall, and maintenance cost
Tags:
Orchestration · Prefect · Wikidata — Broadcast October 4, 2022This session has no code files in the repository — only the session PDF is available. The code snippets below are illustrative examples based on the patterns described in the session slides. They are provided for reference and may require adaptation for your environment.
Core Concepts
Prefect Orchestration
A Prefect flow wraps the full pipeline as a Python function decorated with@flow. Individual steps — fetch URLs, extract text, tag entities, write to Neo4j — become @task functions that Prefect can schedule, retry, and monitor independently.
Wikidata Entity Linking
Instead of training a named-entity recogniser, the pipeline sends candidate strings to the Wikidata Entity Search API. Each mention that resolves to a Wikidata item receives a canonical URI (e.g.,http://www.wikidata.org/entity/Q42) which is then used as the node identifier in Neo4j — giving every entity a globally unique, dereferenceable identity from day one.
Graph Observability
After each pipeline run, a set of monitoring queries measure:| Metric | What it tracks |
|---|---|
| Node count delta | How many new entities were added |
| Relationship density | Average degree growth per entity type |
| Coverage ratio | Fraction of article mentions successfully linked |
| Schema drift | New labels or relationship types introduced |
Graph observability is the knowledge-graph equivalent of data pipeline monitoring. Tracking coverage and density over time helps you detect when the source content changes shape before it silently degrades query quality.
Pipeline Design Walkthrough
Define the Prefect flow skeleton
Each task is independently retried and logged by Prefect, making the pipeline robust to transient scraping or API failures.
Merge entities with canonical URIs
Using
MERGE on the Wikidata URI ensures that the same real-world entity appearing in multiple articles is represented by a single node in Neo4j.Resources
Watch the Recording
Full live-stream on YouTube — Session 9, October 4 2022
Session PDF on GitHub
Session slides in PDF format — no code files are present in the repository for this session