Session 16 of Going Meta (broadcast May 2, 2023) explores how the structural properties of a taxonomy graph can be used to quantify how semantically similar two concepts are. Rather than relying on external NLP toolkits, the session demonstrates Neosemantics’ built-in similarity functions —Documentation Index
Fetch the complete documentation index at: https://mintlify.com/jbarrasa/goingmeta/llms.txt
Use this file to discover all available pages before exploring further.
n10s.sim.pathsim, n10s.sim.lchsim, and n10s.sim.wupsim — which implement the classic Wu-Palmer, Leacock-Chodorow, and path-based metrics directly in Cypher. The session covers a small automobile taxonomy first, then scales up to a Wikidata software concepts taxonomy used in earlier episodes.
Watch Recording
Full session recording on YouTube
Source Code
Cypher scripts and taxonomy files
Overview
| Broadcast | May 2, 2023 |
| Tags | Python NLTK Semantics Taxonomy |
| Similarity functions | n10s.sim.pathsim, n10s.sim.lchsim, n10s.sim.wupsim |
| Taxonomies used | VW automobile taxonomy · Wikidata software concepts |
What You Will Learn
- The theoretical basis for path-based, Wu-Palmer, and Leacock-Chodorow similarity metrics
- How taxonomy depth and graph structure affect each metric differently
- Computing all three metrics between two concept nodes in a single Cypher call
- Visualising the shared-ancestor path between two concepts with
n10s.sim.pathsim.path - How extending a taxonomy (adding deeper nodes) changes Leacock-Chodorow scores
- Applying the same metrics to a large real-world SKOS taxonomy from Wikidata
The Three Metrics
| Metric | Key idea | Sensitivity to depth |
|---|---|---|
| Path similarity | Inverse of shortest path length between two nodes | Low — depends only on distance |
| Wu-Palmer (WUP) | Ratio of depth of lowest common ancestor to sum of individual depths | Medium — considers ancestor position |
| Leacock-Chodorow (LCH) | Normalises path length by the maximum depth of the taxonomy | High — adding deeper nodes changes all scores |
Step-by-Step Walkthrough
Initialise the graph for RDF import
Set up the uniqueness constraint and Neosemantics graph configuration before importing any data.
Import the automobile taxonomy
A small OWL taxonomy of Volkswagen vehicle types (generated in Protégé) serves as the initial worked example.
Compute all three metrics between two sibling concepts
n10s.sim.pathsim.value, n10s.sim.lchsim.value, and n10s.sim.wupsim.value each take two Class nodes and return a floating-point score. Calling all three in the same RETURN clause gives a quick comparison.Higher scores indicate greater similarity. A score of 1.0 from
pathsim means the two nodes are identical. Values decrease as the two concepts become more distant in the taxonomy.Visualise the shared path between two concepts
The
.path variant of pathsim returns the traversal path through the lowest common ancestor, making it easy to understand which ancestor connects the two concepts and how many hops separate them.Compare concepts at different levels of the hierarchy
Exploring pairs at different depths shows how each metric handles distance and specificity differently.
Extend the taxonomy and observe LCH sensitivity
Leacock-Chodorow normalises by the maximum depth of the taxonomy. Adding a deeper node changes the maximum depth and therefore shifts all existing LCH scores — even for concept pairs that were not modified.
Choosing a Metric
Path Similarity
Best when you only care about how many hops separate two concepts. Simple and fast; insensitive to where in the hierarchy the concepts sit.
Wu-Palmer (WUP)
Balances ancestor depth with path length. Rewards concepts that share a deep common ancestor, making it useful for fine-grained similarity tasks.
Leacock-Chodorow (LCH)
Normalises by taxonomy depth, making scores comparable across different taxonomies of different sizes. Sensitive to structural changes that alter overall depth.
NLTK comparison
The session compares these graph-native metrics against the equivalent NLTK WordNet functions, showing that graph-based approaches generalise to any domain taxonomy.
Key Concepts
Lowest common ancestor (LCA) — All three metrics rely on finding the LCA: the deepest node that is an ancestor of both concepts in the hierarchy. The richer and deeper the taxonomy, the more informative the LCA tends to be. Taxonomy depth sensitivity — Wu-Palmer and path similarity are local: they depend only on the path between the two queried concepts and their LCA. Leacock-Chodorow is global: it accounts for the maximum depth of the entire taxonomy, so structural changes elsewhere in the graph affect existing scores. SKOS vian10s.skos.import.fetch — SKOS concept schemes use skos:broader / skos:narrower rather than rdfs:subClassOf. Neosemantics’ n10s.skos.import.fetch maps SKOS predicates to the :SCO (subClassOf) relationship so that the same similarity functions work without modification.
The
n10s.sim.* functions are available in Neosemantics 4.x and later. See the Neosemantics similarity documentation for the full function reference including edge cases such as comparing a concept to itself or to a direct ancestor.Resources
Neosemantics Similarity Functions
Full reference for n10s.sim.* Cypher functions
NLTK Similarity Metrics
Wu-Palmer and LCH in NLTK’s WordNet interface
SKOS Reference
W3C Simple Knowledge Organization System specification
Session 2 — Semantic Search
The Wikidata software taxonomy used in this session’s scale-up exercise