Session 14 of Going Meta (broadcast March 7, 2023) tackles one of the most common real-world knowledge-graph challenges: reconciling disease taxonomies that have been developed independently by different organisations. Using Wikidata, the Medical Subject Headings (MeSH), and the Disease Ontology (DO) as examples, the session shows how to load all three SKOS-style hierarchies into Neo4j with Neosemantics, align their cross-references, and use Cypher pattern matching to discover structural discrepancies and infer missing equivalence links.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/jbarrasa/goingmeta/llms.txt
Use this file to discover all available pages before exploring further.
Watch Recording
Full session recording on YouTube
Source Code
Cypher scripts for the full reconciliation workflow
Overview
| Broadcast | March 7, 2023 |
| Tags | RDF SPARQL Cypher |
| Taxonomies used | Wikidata · MeSH · Disease Ontology |
| Key procedure | n10s.rdf.import.fetch, n10s.rdf.stream.fetch, n10s.rdf.import.inline |
What You Will Learn
- Setting up Neosemantics for RDF import with URI mapping
- Constructing SPARQL queries to pull disease hierarchies from Wikidata and MeSH
- Loading an OWL ontology file selectively using
n10s.rdf.stream.fetch - Converting cross-reference properties into explicit
SAME_ASrelationships - Detecting structural discrepancies: different granularities, generalisations, and missing links
- Generating Wikidata enrichment triples from incomplete “triangles” found in the graph
Setup
Step-by-Step Walkthrough
Import the Wikidata disease taxonomy
A SPARQL CONSTRUCT query assembles the hierarchy of infectious diseases (
wd:Q18123741) along with cross-references to MeSH and the Disease Ontology. The result is fetched as N-Triples and loaded with n10s.rdf.import.fetch.Remove shortcut relationships from Wikidata
The Wikidata hierarchy sometimes contains “shortcuts” — direct
HAS_PARENT links that skip intermediate nodes. These create noise in path-based analysis and should be removed.Import the MeSH taxonomy
Pull the infectious disease branch (
mesh:D007239) from the MeSH SPARQL endpoint using a CONSTRUCT query that maps predicates to the same HAS_PARENT / label vocabulary used for Wikidata.Load the Disease Ontology with selective streaming
The Disease Ontology is available as an OWL/RDF-XML file. Because it is large and contains non-disease content,
n10s.rdf.stream.fetch is used to collect only owl:Class subjects first, then filter to the relevant predicates before importing.The selective streaming approach collects only
owl:Class members in a first pass, then imports only the predicates needed for the reconciliation exercise. This substantially reduces import time and keeps the graph free of irrelevant OWL axioms.Convert cross-reference properties to SAME_AS relationships
The Disease Ontology stores MeSH cross-references as properties (e.g.
hasDbXref: "MESH:D007239"). Converting these to explicit SAME_AS relationships makes the graph consistent with the Wikidata taxonomy.Discover reconciliation patterns
With three aligned taxonomies sharing Pattern 2 — Generalisations (multiple concepts in one taxonomy mapped to one in another):Pattern 3 — Perfect triangles (concept is aligned across all three taxonomies):
HAS_PARENT and SAME_AS, Cypher can surface structural differences at scale.Pattern 1 — Different granularities (same concept mapped at different depths):Exploring a Single Disease Lineage
Key Concepts
Vocabulary mapping — Neosemanticsn10s.mapping.add lets you rename RDF predicates and types at import time so that OWL’s rdfs:subClassOf becomes HAS_PARENT, creating a unified vocabulary across all three taxonomies without any post-processing.
SAME_AS as the reconciliation edge — Storing cross-references as SAME_AS relationships (rather than string properties) turns reconciliation into a graph traversal problem, making Cypher pattern matching a natural fit for finding triangles, generalisations, and missing links.
Incomplete triangles as data quality signals — Any configuration WD_Disease – DO_Disease – Mesh_Disease where one leg of the triangle is absent is a candidate enrichment: one taxonomy knows something the others do not, and that knowledge can be exported as new RDF triples.
All three SPARQL endpoints (Wikidata, MeSH) are publicly accessible without authentication. However, rate limits apply — if you experience timeouts, add
LIMIT clauses to the CONSTRUCT queries and import in batches.Resources
Neosemantics (n10s)
RDF and Linked Data integration for Neo4j
Wikidata SPARQL Endpoint
Interactive SPARQL query interface for Wikidata
MeSH SPARQL Endpoint
NLM Medical Subject Headings linked data service
Disease Ontology
OWL disease ontology maintained by the DO Consortium