Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jbarrasa/goingmeta/llms.txt

Use this file to discover all available pages before exploring further.

Session 24 of Going Meta, broadcast on January 4, 2024, introduces an ontology-driven approach to RAG: rather than letting the LLM extract entities and relationships freely, an OWL ontology explicitly defines what the graph should contain. Jesus Barrasa builds a legislation knowledge graph constrained by a custom ontology modelled in Protégé, then shows how LangChain’s Neo4jVector and CypherQAChain can query it — combining vector search, structured Cypher generation, and ontology-backed schema introspection.

What You’ll Learn

  • How to design an OWL ontology for RAG (using the http://www.nsmntx.org/2024/01/rag base ontology)
  • How to load an OWL ontology into Neo4j using neosemantics before populating data
  • How to use Neo4jVector (LangChain) for vector-based retrieval over the knowledge graph
  • How to use CypherQAChain (LangChain) to generate and run Cypher from natural language
  • How to dynamically expose the graph schema to the LLM for accurate Cypher generation

Architecture Overview

Ontology as Schema Contract

The OWL ontology defines the allowed entity types and relationships before any data is loaded. This means the LLM extracts only what the ontology permits — no schema drift.

LangChain RAG Chains

Neo4jVector handles semantic retrieval; CypherQAChain handles structured question-answering by generating Cypher and executing it against the graph.

Step-by-Step Walkthrough

1

Populate the graph from the Python notebook

The session is driven by the Python notebook Ontology_Driven_RAG_patterns.ipynb. Start by running the graph population section to load the legislation dataset into Neo4j.
2

Run test vector searches

Before adding the ontology, verify that the vector index is working correctly by running test similarity searches from the notebook.
3

Create and load your ontology

Design a domain ontology in Protégé that extends the RAG patterns base ontology (http://www.nsmntx.org/2024/01/rag). For this session, a legislation ontology is used — you can use the one provided or create your own.Load the ontology into Neo4j using neosemantics:
CALL n10s.onto.import.fetch(
  "https://raw.githubusercontent.com/jbarrasa/goingmeta/main/session24/gm24-onto-legislation.ttl",
  "Turtle"
)
Load the ontology before running the dynamic Cypher generation section of the notebook. The notebook marks this point with a comment (LOAD THE ONTOLOGY...).
4

Run RAG chains with LangChain

With the ontology loaded, run the LangChain RAG chain sections of the notebook. The chains use Neo4jVector for vector retrieval and CypherQAChain to generate and execute Cypher queries.
5

Introspect the graph schema dynamically

The CypherQAChain workflow depends on passing the graph schema to the LLM so it can generate accurate Cypher. Neo4j’s self-describing capabilities make this straightforward:
// All node types and their properties
CALL db.schema.nodeTypeProperties()

// All relationship types and their properties
CALL apoc.meta.relTypeProperties()
The output of these queries is included in the LLM prompt at query time, so the generated Cypher always reflects the current state of the graph.

Example: Ontology-Constrained Data Modelling

The insurance dataset example from the session shows how n10s.experimental.export.dimodel.fetch generates a data integration config from a subset of the ontology:
CALL n10s.experimental.export.dimodel.fetch(
  "https://raw.githubusercontent.com/datadotworld/cwd-benchmark-data/main/ACME_Insurance/ontology/insurance.ttl",
  "Turtle",
  {
    classList: [
      "http://data.world/schema/insurance/Policy",
      "http://data.world/schema/insurance/PolicyHolder",
      "http://data.world/schema/insurance/Agent"
    ]
  }
);
Or, to inspect the model inline:
CALL n10s.experimental.stream.dimodel.fetch(
  "https://raw.githubusercontent.com/datadotworld/cwd-benchmark-data/main/ACME_Insurance/ontology/insurance.ttl",
  "Turtle",
  {
    classList: [
      "http://data.world/schema/insurance/Policy",
      "http://data.world/schema/insurance/PolicyHolder",
      "http://data.world/schema/insurance/Agent"
    ]
  }
)

Example: Querying the Populated Graph

Once data is loaded according to the ontology-constrained mapping, you can query it with standard Cypher. For example, to count policies sold per agent:
MATCH (p:Policy)-[:soldByAgent]->(a:Agent)
RETURN a.agentId AS AgentID, COUNT(p) AS PoliciesSold
Because the schema is derived from an OWL ontology, the relationship name soldByAgent and the node labels Policy / Agent are predictable and consistent — making LLM-generated Cypher far more reliable than when schema is inferred ad-hoc.

Key Concepts

Ontology-First Graph Design

Defining the ontology before loading data ensures a clean, consistent schema that both humans and LLMs can reason about predictably.

Dynamic Schema Introspection

Using db.schema.nodeTypeProperties() and apoc.meta.relTypeProperties() at query time means the LLM always sees the current schema — no manual schema maintenance required.

Neo4jVector for Retrieval

LangChain’s Neo4jVector abstracts the vector index query, embedding the user’s question and finding the closest document chunks in one call.

CypherQAChain for Structured QA

CypherQAChain generates Cypher from natural language, executes it, and uses the results as grounded context for the final LLM answer — combining SQL-like precision with LLM flexibility.

Resources

Watch the Recording

Full session recording on YouTube — January 4, 2024.

Session Code

Python notebook, ontology file, and Cypher scripts on GitHub.

Build docs developers (and LLMs) love