Vector Search vs Graph-Based Semantic Search in Neo4j
Compare pure vector similarity search via Neo4j vector indexes against ontology-based graph traversal with n10s.sim.pathsim, then combine both approaches.
Use this file to discover all available pages before exploring further.
Session 21 of Going Meta, broadcast on October 6, 2023, puts two semantic search strategies head-to-head in Neo4j: pure vector similarity search using embedding indexes, and graph-based semantic similarity powered by ontology path traversal. Jesus Barrasa builds a dataset of developer articles from dev.to, annotates them with concepts from a SKOS ontology, generates embeddings, and then shows how the two approaches produce different — and complementary — results.
Query the vector index to find the five articles most similar (by embedding cosine distance) to a given article:
MATCH (a:Article { uri: "https://dev.to/qainsights/performance-testing-neo4j-database-using-bolt-protocol-in-apache-jmeter-1oa9" })CALL db.index.vector.queryNodes('article-embeddings', 5, a.embedding)YIELD node AS similarArticle, scoreWHERE similarArticle <> aRETURN a.title AS original, similarArticle.title AS similar, score
Vector search ranks articles by how close their embedding vectors are — it captures distributional similarity in the embedding space, independent of any explicit ontology structure.
For a broader ranking across all article pairs, traverse the ontology paths directly and average the similarity scores across all concept pairs each article references:
MATCH (a:Article { uri: "https://dev.to/qainsights/performance-testing-neo4j-database-using-bolt-protocol-in-apache-jmeter-1oa9" })-[rt1:refers_to]->(c1)MATCH (similarArticle:Article)-[rt2:refers_to]->(c2)WHERE similarArticle <> aRETURN a.title AS original, similarArticle.title AS similar, avg(n10s.sim.pathsim.value(c1, c2)) AS sim, collect(n10s.sim.pathsim.value(c1, c2))ORDER BY sim DESC LIMIT 4
Re-running the graph-based search returns different results because the richer concept hierarchy creates new paths between articles. The vector search results are unchanged — a critical difference between the two paradigms.