Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jbarrasa/goingmeta/llms.txt

Use this file to discover all available pages before exploring further.

Session 22 of Going Meta, broadcast on November 7, 2023, introduces Retrieval Augmented Generation (RAG) with knowledge graphs. Jesus Barrasa picks up the dev.to article graph built in Session 21 and extends it into a full RAG pipeline: a Neo4j vector index retrieves the most relevant chunks, graph traversal enriches the context with structured relationships, and an LLM generates a final answer grounded in both sources.

What You’ll Learn

  • How to rebuild the Session 21 knowledge graph in a single Cypher script
  • How to use db.index.vector.queryNodes for embedding-based retrieval
  • How to enrich retrieved results with graph context before passing them to an LLM
  • How graph-based semantic search produces human-readable context for RAG prompts
  • The architectural difference between vector-only RAG and graph-enhanced RAG

Graph Setup (Full Rebuild Script)

The session opens by rebuilding the entire knowledge graph from Session 21 in one consolidated script, so participants can start from scratch:
1

Load articles and ontology

// Load articles from CSV file
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/jbarrasa/goingmeta/main/session21/resources/data/devto-articles.csv' AS row
CREATE (a:Article { uri: row.uri })
SET a.title = row.title, a.body = row.body, a.datetime = datetime(row.date);

// Load the ontology
CREATE CONSTRAINT n10s_unique_uri FOR (r:Resource) REQUIRE r.uri IS UNIQUE;

CALL n10s.graphconfig.init({ handleVocabUris: "IGNORE" });

CALL n10s.skos.import.fetch(
  "https://github.com/jbarrasa/goingmeta/raw/main/session21/resources/ontos/dbpedia-sw.ttl",
  "Turtle"
);

MATCH (s:Class)-[shortcut:SCO]->(p:Class)<-[:SCO*2..]-(s)
DELETE shortcut;
2

Link articles to ontology concepts

LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/jbarrasa/goingmeta/main/session21/resources/data/extracted-entities.csv" AS row
MATCH (a:Article { uri: row.articleuri })
MATCH (c:Class { uri: row.concepturi })
MERGE (a)-[:refers_to]->(c);
3

Create vector index and load embeddings

CALL db.index.vector.createNodeIndex('article-embeddings', 'Article', 'embedding', 1536, 'cosine');

CALL apoc.load.json("https://github.com/jbarrasa/goingmeta/raw/main/session21/resources/data/article-embeddings.json") YIELD value
MATCH (a:Article { uri: value.id })
CALL db.create.setNodeVectorProperty(a, 'embedding', value.vector);

RAG Retrieval Patterns

Vector Retrieval

The simplest retrieval step uses the vector index to find the most semantically similar articles to a given query embedding:
MATCH (a:Article { uri: "https://dev.to/qainsights/performance-testing-neo4j-database-using-bolt-protocol-in-apache-jmeter-1oa9" })
CALL db.index.vector.queryNodes('article-embeddings', 7, a.embedding)
YIELD node AS similarArticle, score
WHERE similarArticle <> a
RETURN a.title AS original, similarArticle.title AS similar, score

Graph-Enriched Retrieval for RAG

The session’s key contribution is showing how graph traversal produces not just a list of similar articles, but a textual explanation of the similarity path — ready to inject directly into an LLM prompt as grounded context:
MATCH (a:Article { uri: "https://dev.to/qainsights/performance-testing-neo4j-database-using-bolt-protocol-in-apache-jmeter-1oa9" })-[rt1:refers_to]->(c1)
CALL n10s.sim.pathsim.search(c1, 0.2, { simulateRoot: false }) YIELD node AS relatedTopic
WITH a, c1, collect(relatedTopic) + [c1] AS topics
UNWIND topics AS c2
MATCH (similarArticle:Article)-[rt2:refers_to]->(c2)
WITH a.title AS original, similarArticle.title AS similar,
     [x IN collect(n10s.sim.pathsim.value(c1, c2, { simulateRoot: false })) WHERE x > 0] AS sims,
     collect(
       ["the original article mentions explicitly " + nodes(n10s.sim.pathsim.path(c1, c2, { simulateRoot: false }))[0].prefLabel] +
       [" the recommended article mentions explicitly " + nodes(n10s.sim.pathsim.path(c1, c2, { simulateRoot: false }))[-1].prefLabel] +
       [r IN relationships(n10s.sim.pathsim.path(c1, c2, { simulateRoot: false })) |
         startNode(r).prefLabel + " is a type of " + endNode(r).prefLabel]
     ) AS paths_as_text
WHERE sims <> []
RETURN original, similar, apoc.coll.avg(sims) AS sim, sims,
       reduce(result = "", x IN paths_as_text | result + reduce(inner = "", y IN x | inner + "\n" + y))
ORDER BY sim DESC LIMIT 4
The paths_as_text output — e.g. “the original article mentions Apache JMeter, the recommended article mentions Gatling, Apache JMeter is a type of Load Testing Tool” — provides a factual, traceable context string you can append to an LLM prompt to prevent hallucination.

Architecture Comparison

Vector-Only RAG

Fast and scalable. Retrieves documents by embedding similarity. Works without a knowledge graph but cannot explain why results are relevant.

Graph-Enhanced RAG

Retrieves documents via ontology path traversal. Slower but explainable — the graph can generate a natural-language justification for each recommendation.

Hybrid RAG

Use the vector index for broad candidate retrieval, then re-rank or enrich with graph context. Combines speed with explainability.

Ontology as Knowledge Layer

The SKOS/OWL ontology acts as the structured background knowledge that makes graph-based retrieval work — a key differentiator vs. pure embedding approaches.
The full RAG pipeline (retrieval + LLM call + answer generation) is implemented in the Python notebook accompanying this session, available in the GitHub repository.

Resources

Watch the Recording

Full session recording on YouTube — November 7, 2023.

Session Code

All CQL scripts and Python notebooks on GitHub.

Build docs developers (and LLMs) love