Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jbarrasa/goingmeta/llms.txt

Use this file to discover all available pages before exploring further.

Session 21 of Going Meta, broadcast on October 6, 2023, puts two semantic search strategies head-to-head in Neo4j: pure vector similarity search using embedding indexes, and graph-based semantic similarity powered by ontology path traversal. Jesus Barrasa builds a dataset of developer articles from dev.to, annotates them with concepts from a SKOS ontology, generates embeddings, and then shows how the two approaches produce different — and complementary — results.

What You’ll Learn

  • How to load article data and link it to a SKOS ontology via named entity extraction
  • How to create and query a vector index in Neo4j (db.index.vector)
  • How to compute semantic similarity using ontology path traversal with n10s.sim.pathsim
  • How extending the ontology changes graph-based search results but not vector results
  • How to combine both approaches into a hybrid RAG-ready output

Dataset Setup

1

Load articles from CSV

Import the dev.to articles dataset into Neo4j, creating one node per article with title, body, and datetime:
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/jbarrasa/goingmeta/main/session21/resources/data/devto-articles.csv' AS row
CREATE (a:Article { uri: row.uri })
SET a.title = row.title, a.body = row.body, a.datetime = datetime(row.date);
2

Load the SKOS ontology and clean up redundant shortcuts

Import a DBpedia-derived software taxonomy in SKOS format, then remove transitive shortcuts that would distort path-based similarity scores:
CREATE CONSTRAINT n10s_unique_uri FOR (r:Resource) REQUIRE r.uri IS UNIQUE;

CALL n10s.graphconfig.init({ handleVocabUris: "IGNORE" });

CALL n10s.skos.import.fetch(
  "https://github.com/jbarrasa/goingmeta/raw/main/session21/resources/ontos/dbpedia-sw.ttl",
  "Turtle"
);

MATCH (s:Class)-[shortcut:SCO]->(p:Class)<-[:SCO*2..]-(s)
DELETE shortcut;
3

Link articles to ontology concepts

Use pre-computed named entity extraction results to connect each article to the ontology concepts it references:
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/jbarrasa/goingmeta/main/session21/resources/data/extracted-entities.csv" AS row
MATCH (a:Article { uri: row.articleuri })
MATCH (c:Class { uri: row.concepturi })
MERGE (a)-[:refers_to]->(c)
4

Create the vector index and populate embeddings

Create a 1536-dimension cosine vector index (matching OpenAI’s text-embedding-ada-002 output) and populate it from pre-computed embedding data:
CALL db.index.vector.createNodeIndex('article-embeddings', 'Article', 'embedding', 1536, 'cosine');

CALL apoc.load.json("https://github.com/jbarrasa/goingmeta/raw/main/session21/resources/data/article-embeddings.json") YIELD value
MATCH (a:Article { uri: value.id })
SET a.embedding = value.vector;

Comparing the Two Search Approaches

Query the vector index to find the five articles most similar (by embedding cosine distance) to a given article:
MATCH (a:Article { uri: "https://dev.to/qainsights/performance-testing-neo4j-database-using-bolt-protocol-in-apache-jmeter-1oa9" })
CALL db.index.vector.queryNodes('article-embeddings', 5, a.embedding)
YIELD node AS similarArticle, score
WHERE similarArticle <> a
RETURN a.title AS original, similarArticle.title AS similar, score
Vector search ranks articles by how close their embedding vectors are — it captures distributional similarity in the embedding space, independent of any explicit ontology structure.

Graph-Based Semantic Search with Path Similarity

Use n10s.sim.pathsim.value to compute the Wu-Palmer-style similarity between concept pairs connected through the ontology hierarchy:
MATCH (a:Article { title: "Performance Testing Neo4j Database using Bolt Protocol in Apache JMeter" })-[rt1:refers_to]->(c1)
MATCH (b:Article { title: "Couchbase GeoSearch with ASP.NET Core" })-[rt2:refers_to]->(c2)
RETURN n10s.sim.pathsim.value(c1, c2, { simulateRoot: false }) AS sim,
       [n IN nodes(n10s.sim.pathsim.path(c1, c2, { simulateRoot: false })) | n.prefLabel]
For a broader ranking across all article pairs, traverse the ontology paths directly and average the similarity scores across all concept pairs each article references:
MATCH (a:Article { uri: "https://dev.to/qainsights/performance-testing-neo4j-database-using-bolt-protocol-in-apache-jmeter-1oa9" })-[rt1:refers_to]->(c1)
MATCH (similarArticle:Article)-[rt2:refers_to]->(c2)
WHERE similarArticle <> a
RETURN a.title AS original, similarArticle.title AS similar,
       avg(n10s.sim.pathsim.value(c1, c2)) AS sim,
       collect(n10s.sim.pathsim.value(c1, c2))
ORDER BY sim DESC LIMIT 4

How Ontology Changes Affect Results

One of the session’s key demonstrations is loading a second ontology (swstacks.ttl) that enriches the concept hierarchy. After loading it with:
CALL n10s.onto.import.fetch(
  "https://raw.githubusercontent.com/jbarrasa/goingmeta/main/session21/resources/ontos/swstacks.ttl",
  "Turtle"
);
Re-running the graph-based search returns different results because the richer concept hierarchy creates new paths between articles. The vector search results are unchanged — a critical difference between the two paradigms.

Resources

Watch the Recording

Full session recording on YouTube — October 6, 2023.

Session Code

All CQL scripts and data files on GitHub.

Build docs developers (and LLMs) love