Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jbarrasa/goingmeta/llms.txt

Use this file to discover all available pages before exploring further.

Session 16 of Going Meta (broadcast May 2, 2023) explores how the structural properties of a taxonomy graph can be used to quantify how semantically similar two concepts are. Rather than relying on external NLP toolkits, the session demonstrates Neosemantics’ built-in similarity functions — n10s.sim.pathsim, n10s.sim.lchsim, and n10s.sim.wupsim — which implement the classic Wu-Palmer, Leacock-Chodorow, and path-based metrics directly in Cypher. The session covers a small automobile taxonomy first, then scales up to a Wikidata software concepts taxonomy used in earlier episodes.

Watch Recording

Full session recording on YouTube

Source Code

Cypher scripts and taxonomy files

Overview

BroadcastMay 2, 2023
TagsPython NLTK Semantics Taxonomy
Similarity functionsn10s.sim.pathsim, n10s.sim.lchsim, n10s.sim.wupsim
Taxonomies usedVW automobile taxonomy · Wikidata software concepts

What You Will Learn

  • The theoretical basis for path-based, Wu-Palmer, and Leacock-Chodorow similarity metrics
  • How taxonomy depth and graph structure affect each metric differently
  • Computing all three metrics between two concept nodes in a single Cypher call
  • Visualising the shared-ancestor path between two concepts with n10s.sim.pathsim.path
  • How extending a taxonomy (adding deeper nodes) changes Leacock-Chodorow scores
  • Applying the same metrics to a large real-world SKOS taxonomy from Wikidata

The Three Metrics

MetricKey ideaSensitivity to depth
Path similarityInverse of shortest path length between two nodesLow — depends only on distance
Wu-Palmer (WUP)Ratio of depth of lowest common ancestor to sum of individual depthsMedium — considers ancestor position
Leacock-Chodorow (LCH)Normalises path length by the maximum depth of the taxonomyHigh — adding deeper nodes changes all scores

Step-by-Step Walkthrough

1

Initialise the graph for RDF import

Set up the uniqueness constraint and Neosemantics graph configuration before importing any data.
// Initialise graph for RDF import
CREATE CONSTRAINT n10s_unique_uri FOR (r:Resource) REQUIRE r.uri IS UNIQUE;

CALL n10s.graphconfig.init({ handleVocabUris: "IGNORE" });
2

Import the automobile taxonomy

A small OWL taxonomy of Volkswagen vehicle types (generated in Protégé) serves as the initial worked example.
// Import a simple taxonomy (Ontology generated with Protégé)
CALL n10s.onto.import.fetch(
  "https://raw.githubusercontent.com/jbarrasa/goingmeta/main/session16/taxonomies/vw.ttl",
  "Turtle"
)
3

Compute all three metrics between two sibling concepts

n10s.sim.pathsim.value, n10s.sim.lchsim.value, and n10s.sim.wupsim.value each take two Class nodes and return a floating-point score. Calling all three in the same RETURN clause gives a quick comparison.
MATCH (a:Class { name: "Electric" }), (b:Class { name: "Tiguan" })
RETURN
  n10s.sim.pathsim.value(a, b) AS path,
  n10s.sim.lchsim.value(a, b)  AS lch,
  n10s.sim.wupsim.value(a, b)  AS wup
Higher scores indicate greater similarity. A score of 1.0 from pathsim means the two nodes are identical. Values decrease as the two concepts become more distant in the taxonomy.
4

Visualise the shared path between two concepts

The .path variant of pathsim returns the traversal path through the lowest common ancestor, making it easy to understand which ancestor connects the two concepts and how many hops separate them.
MATCH (a:Class { name: "Golf" }), (b:Class { name: "Tiguan" })
RETURN n10s.sim.pathsim.path(a, b)
5

Compare concepts at different levels of the hierarchy

Exploring pairs at different depths shows how each metric handles distance and specificity differently.
// Comparing two mid-level categories
MATCH (a:Class { name: "Convertible" }), (b:Class { name: "SUV" })
RETURN n10s.sim.lchsim.value(a, b)
6

Extend the taxonomy and observe LCH sensitivity

Leacock-Chodorow normalises by the maximum depth of the taxonomy. Adding a deeper node changes the maximum depth and therefore shifts all existing LCH scores — even for concept pairs that were not modified.
// Add a deeper subclass to the taxonomy
CALL n10s.onto.import.inline('
  @prefix : <http://localhost/ontologies/2019/1/10/automobile#> .
  @prefix owl:  <http://www.w3.org/2002/07/owl#> .
  @prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
  @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

  :TiguanSpecial rdf:type owl:Class ;
          rdfs:subClassOf :Tiguan ;
          rdfs:label "Tiguan Special" .
', "Turtle")
After adding :TiguanSpecial, re-run the Convertible vs SUV LCH query from the previous step. The score will change because the maximum taxonomy depth has increased, even though neither of those two classes was modified. Wu-Palmer and path similarity are unaffected.
7

Scale up to a real-world SKOS taxonomy

The session concludes by applying the same metrics to the Wikidata software concepts taxonomy used in Session 2 — a much larger SKOS hierarchy that demonstrates how the metrics behave at scale.
// Clear the automobile taxonomy
MATCH (n:Resource) DETACH DELETE n

// Load the Wikidata software concepts SKOS taxonomy
CALL n10s.skos.import.fetch(
  "https://github.com/jbarrasa/goingmeta/raw/main/session02/resources/goingmeta-skos.ttl",
  "Turtle"
)
// Look up Neo4j (Q1628290) and MongoDB (Q1165204) by Wikidata identifier
MATCH (a:Class { name: "Q1628290" }), (b:Class { name: "Q1165204" })
RETURN n10s.sim.pathsim.path(a, b) AS path
// Or look up by human-friendly prefLabel
MATCH (neo:Class) WHERE neo.prefLabel CONTAINS "Neo4j"
MATCH (mdb:Class) WHERE mdb.prefLabel CONTAINS "Mongo"
RETURN
  n10s.sim.pathsim.value(neo, mdb) AS path,
  n10s.sim.lchsim.value(neo, mdb)  AS lch,
  n10s.sim.wupsim.value(neo, mdb)  AS wup
// Compare Neo4j vs Java — a more distant pair
MATCH (neo:Class) WHERE neo.prefLabel CONTAINS "Neo4j"
MATCH (j:Class)   WHERE j.prefLabel = "Java"
RETURN
  n10s.sim.pathsim.value(neo, j) AS path,
  n10s.sim.lchsim.value(neo, j)  AS lch,
  n10s.sim.wupsim.value(neo, j)  AS wup

Choosing a Metric

Path Similarity

Best when you only care about how many hops separate two concepts. Simple and fast; insensitive to where in the hierarchy the concepts sit.

Wu-Palmer (WUP)

Balances ancestor depth with path length. Rewards concepts that share a deep common ancestor, making it useful for fine-grained similarity tasks.

Leacock-Chodorow (LCH)

Normalises by taxonomy depth, making scores comparable across different taxonomies of different sizes. Sensitive to structural changes that alter overall depth.

NLTK comparison

The session compares these graph-native metrics against the equivalent NLTK WordNet functions, showing that graph-based approaches generalise to any domain taxonomy.

Key Concepts

Lowest common ancestor (LCA) — All three metrics rely on finding the LCA: the deepest node that is an ancestor of both concepts in the hierarchy. The richer and deeper the taxonomy, the more informative the LCA tends to be. Taxonomy depth sensitivity — Wu-Palmer and path similarity are local: they depend only on the path between the two queried concepts and their LCA. Leacock-Chodorow is global: it accounts for the maximum depth of the entire taxonomy, so structural changes elsewhere in the graph affect existing scores. SKOS via n10s.skos.import.fetch — SKOS concept schemes use skos:broader / skos:narrower rather than rdfs:subClassOf. Neosemantics’ n10s.skos.import.fetch maps SKOS predicates to the :SCO (subClassOf) relationship so that the same similarity functions work without modification.
The n10s.sim.* functions are available in Neosemantics 4.x and later. See the Neosemantics similarity documentation for the full function reference including edge cases such as comparing a concept to itself or to a direct ancestor.

Resources

Neosemantics Similarity Functions

Full reference for n10s.sim.* Cypher functions

NLTK Similarity Metrics

Wu-Palmer and LCH in NLTK’s WordNet interface

SKOS Reference

W3C Simple Knowledge Organization System specification

Session 2 — Semantic Search

The Wikidata software taxonomy used in this session’s scale-up exercise

Build docs developers (and LLMs) love