Session 24 of Going Meta, broadcast on January 4, 2024, introduces an ontology-driven approach to RAG: rather than letting the LLM extract entities and relationships freely, an OWL ontology explicitly defines what the graph should contain. Jesus Barrasa builds a legislation knowledge graph constrained by a custom ontology modelled in Protégé, then shows how LangChain’sDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/jbarrasa/goingmeta/llms.txt
Use this file to discover all available pages before exploring further.
Neo4jVector and CypherQAChain can query it — combining vector search, structured Cypher generation, and ontology-backed schema introspection.
What You’ll Learn
- How to design an OWL ontology for RAG (using the
http://www.nsmntx.org/2024/01/ragbase ontology) - How to load an OWL ontology into Neo4j using neosemantics before populating data
- How to use
Neo4jVector(LangChain) for vector-based retrieval over the knowledge graph - How to use
CypherQAChain(LangChain) to generate and run Cypher from natural language - How to dynamically expose the graph schema to the LLM for accurate Cypher generation
Architecture Overview
Ontology as Schema Contract
The OWL ontology defines the allowed entity types and relationships before any data is loaded. This means the LLM extracts only what the ontology permits — no schema drift.
LangChain RAG Chains
Neo4jVector handles semantic retrieval; CypherQAChain handles structured question-answering by generating Cypher and executing it against the graph.Step-by-Step Walkthrough
Populate the graph from the Python notebook
The session is driven by the Python notebook
Ontology_Driven_RAG_patterns.ipynb. Start by running the graph population section to load the legislation dataset into Neo4j.Run test vector searches
Before adding the ontology, verify that the vector index is working correctly by running test similarity searches from the notebook.
Create and load your ontology
Design a domain ontology in Protégé that extends the RAG patterns base ontology (
http://www.nsmntx.org/2024/01/rag). For this session, a legislation ontology is used — you can use the one provided or create your own.Load the ontology into Neo4j using neosemantics:Load the ontology before running the dynamic Cypher generation section of the notebook. The notebook marks this point with a comment
(LOAD THE ONTOLOGY...).Run RAG chains with LangChain
With the ontology loaded, run the LangChain RAG chain sections of the notebook. The chains use
Neo4jVector for vector retrieval and CypherQAChain to generate and execute Cypher queries.Introspect the graph schema dynamically
The CypherQAChain workflow depends on passing the graph schema to the LLM so it can generate accurate Cypher. Neo4j’s self-describing capabilities make this straightforward:The output of these queries is included in the LLM prompt at query time, so the generated Cypher always reflects the current state of the graph.
Example: Ontology-Constrained Data Modelling
The insurance dataset example from the session shows hown10s.experimental.export.dimodel.fetch generates a data integration config from a subset of the ontology:
Example: Querying the Populated Graph
Once data is loaded according to the ontology-constrained mapping, you can query it with standard Cypher. For example, to count policies sold per agent:Key Concepts
Ontology-First Graph Design
Defining the ontology before loading data ensures a clean, consistent schema that both humans and LLMs can reason about predictably.
Dynamic Schema Introspection
Using
db.schema.nodeTypeProperties() and apoc.meta.relTypeProperties() at query time means the LLM always sees the current schema — no manual schema maintenance required.Neo4jVector for Retrieval
LangChain’s
Neo4jVector abstracts the vector index query, embedding the user’s question and finding the closest document chunks in one call.CypherQAChain for Structured QA
CypherQAChain generates Cypher from natural language, executes it, and uses the results as grounded context for the final LLM answer — combining SQL-like precision with LLM flexibility.Resources
Watch the Recording
Full session recording on YouTube — January 4, 2024.
Session Code
Python notebook, ontology file, and Cypher scripts on GitHub.