Measuring OWL Ontology Quality in Neo4j with Cypher

Season 3, Episode 4 of Going Meta introduces a set of structural quality metrics for OWL ontologies that can be computed entirely in Cypher once an ontology has been loaded into Neo4j with n10s (neosemantics). The session adapts five metrics from the software engineering literature — originally designed for object-oriented code quality — and applies them to ontology evaluation. The result is a lightweight, repeatable quality dashboard that requires no external tooling beyond a running Neo4j instance.

Watch the Recording

Season 3, Episode 4 — January 2026

Session Code

Cypher queries and Python evaluation script

Prerequisites

Load your OWL ontology into Neo4j using n10s. After import, OWL classes are stored as (:owl__Class) nodes, datatype properties as (:owl__DatatypeProperty), object properties as (:owl__ObjectProperty), and the subclass hierarchy as [:rdfs__subClassOf] relationships. Domain and range axioms become [:rdfs__domain] and [:rdfs__range] edges.

The Five Metrics

1. CCC — Class Connectivity Coverage

Definition: The fraction of classes that participate in at least one object property relationship (as domain or range). A high CCC indicates an interconnected ontology; a low score signals many isolated classes with no relationships to other classes.

MATCH (c:owl__Class)
WITH c.uri AS c,
     size([(c)-[:rdfs__domain|rdfs__range*2]-(connected) | connected]) AS connected_count,
     size([(c)-[:rdfs__subClassOf*0..]->()-[:rdfs__domain|rdfs__range*2]-(connected) | connected]) AS connected_count_extended
RETURN avg(connected_count) AS CCC,
       avg(connected_count_extended) AS CCC_ext

CCC_ext extends the base metric by also counting connections reachable through the subclass hierarchy, capturing indirect connectivity inherited from parent classes.

2. ANOnto — Annotation Richness

Definition: The proportion of classes that carry at least one annotation property — either rdfs:label or rdfs:comment. Well-annotated ontologies are easier to understand, reuse, and maintain.

MATCH (c:owl__Class)
WITH count(c) AS c_count
MATCH (c:owl__Class) WHERE c.rdfs__comment IS NOT NULL OR c.rdfs__label IS NOT NULL
RETURN c_count, count(c) AS ac_count, count(c) * 100 / c_count AS AN_Onto

The result is expressed as a percentage: 100% means every class has at least a label or comment; lower values indicate undocumented classes.

3. PROnto — Properties Richness

Definition: The ratio of object properties to total properties (object + datatype). A higher value indicates a relationally rich ontology; a lower value indicates a more attribute-heavy model. Standard variant:

MATCH (p:owl__DatatypeProperty)
WITH count(p) AS p_count
MATCH (r:owl__ObjectProperty)
RETURN p_count, count(r) AS r_count, count(r) * 100 / (count(r) + p_count) AS PR_Onto

Class-centric variant (computes per-class attribute-to-relationship ratio then averages):

MATCH (c:owl__Class)
WITH c.uri AS c,
     size([(c)<-[:rdfs__domain]-(dtp:owl__DatatypeProperty) | dtp]) AS p_count,
     size([(c)<-[:rdfs__domain]-(op:owl__ObjectProperty) | op]) AS r_count
WITH c, p_count, r_count,
CASE p_count + r_count
  WHEN 0 THEN 0
  ELSE p_count * 100 / (p_count + r_count)
END AS local_pr_onto
RETURN avg(local_pr_onto) AS PR_Onto

4. LCOMOnto — Lack of Cohesion in Methods

Definition: The mean path length from leaf classes to the top of the hierarchy. Adapted from the software engineering LCOMOnto metric, it captures taxonomy depth — a proxy for how specialised and hierarchically organised the ontology is.

MATCH path = (leaf:owl__Class)-[:rdfs__subClassOf*0..]->(top)
WHERE NOT ()-[:rdfs__subClassOf]->(leaf)
  AND NOT (top)-[:rdfs__subClassOf]->()
WITH length(path) + 1 AS path_length
RETURN sum(path_length), count(*), sum(path_length) / count(*) AS LCOMOnto

A higher LCOMOnto value means leaf classes are buried deeper in the hierarchy (more specialised). A value near 1 indicates a flat taxonomy.

5. CBOOnto — Coupling Between Objects

Definition: The average number of ancestor classes per class. Heavily adapted from the object-oriented CBO metric, it measures how much a class inherits from (or is otherwise coupled to) other classes in the ontology. Base variant (subclass coupling only):

MATCH (c:owl__Class)
WITH c.uri AS c,
     size([(c)-[:rdfs__subClassOf]->(parent) | parent]) AS ancestor_count
RETURN avg(ancestor_count) AS CBOOnto

Extended variant (with zero-safe normalisation):

MATCH (c:owl__Class)
WITH c.uri AS c, size([(c)-[:rdfs__subClassOf]->(parent) | parent]) AS ancestor_count
WITH c,
     CASE ancestor_count
       WHEN 0 THEN 1
       ELSE ancestor_count
     END AS ancestor_count
RETURN avg(ancestor_count) AS CBOOnto

Full variant (including non-taxonomic coupling through object property domain/range):

MATCH (c:owl__Class)
WITH c.uri AS c,
     size([(c)-[:rdfs__subClassOf]->(parent) | parent]) AS ancestor_count,
     size([(c)-[:rdfs__domain|rdfs__range*2]-(related) | related]) AS related_count,
     size([(c)-[:rdfs__subClassOf*0..]->()-[:rdfs__domain|rdfs__range*2]-(related) | related]) AS related_count_extended
RETURN avg(ancestor_count) AS CBOOnto,
       avg(ancestor_count + related_count) AS CBOOnto_1,
       avg(ancestor_count + related_count_extended) AS CBOOnto_2

LLM-Assisted Quality Evaluation with `cceval.py`

Beyond structural metrics, the session introduces cceval.py, which uses an LLM to evaluate an ontology against a set of competency questions (CQs) — the stated requirements the ontology was designed to satisfy. The evaluator scores each CQ from 0 to 1 and generates Cypher seed datasets and queries for empirical verification.

def evaluate_ontology_against_cq(
    ontology_text: str,
    questions: List[str],
    model: str = None,
) -> dict:
    client = OpenAI()
    model = model or os.getenv("OPENAI_MODEL", "gpt-4o-mini-2024-07-18")

    schema = {
        "name": "CQEval",
        "strict": True,
        "schema": {
            "type": "object",
            "properties": {
                "overall_score": {"type": "number", "minimum": 0, "maximum": 1},
                "per_cq": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "question":       {"type": "string"},
                            "score":          {"type": "number", "minimum": 0, "maximum": 1},
                            "reasoning":      {"type": "string"},
                            "suggestions":    {"type": "array", "items": {"type": "string"}},
                            "cypher_dataset": {"type": "string"},
                            "cypher_query":   {"type": "string"},
                        },
                    }
                },
                "global_suggestions": {"type": "array", "items": {"type": "string"}}
            },
        }
    }
    # ... (LLM call with structured output)

The LLM scores each CQ using this rubric:

Score	Meaning
`1.0`	Ontology clearly models the required classes/properties/relationships; a Cypher query can answer the CQ
`0.5`	Partially modelled; answerable only after minor refactoring
`0.0`	Not supported — critical modeling gaps

Empirical verification

The evaluator also runs the LLM-generated Cypher against a real Neo4j instance to confirm the model-level score holds empirically:

def run_empirical_check(
    cq_result: dict,
    neo4j_uri: str      = "<<DB ENDPOINT>>",
    neo4j_user: str     = "<<USER>>",
    neo4j_password: str = "<<PWD>>",
) -> list:
    driver = GraphDatabase.driver(neo4j_uri, auth=(neo4j_user, neo4j_password))
    results = []

    with driver.session() as session:
        for cq in cq_result["per_cq"]:
            session.run("MATCH (n) DETACH DELETE n")

            try:
                session.run(cq["cypher_dataset"])
            except Exception as e:
                results.append({"question": cq["question"], "status": "dataset_error", "error": str(e)})
                continue

            try:
                rows = session.run(cq["cypher_query"]).data()
                results.append({
                    "question":     cq["question"],
                    "llm_score":    cq["score"],
                    "status":       "fully_answered" if rows else "not_answered",
                    "result_count": len(rows),
                })
            except Exception as e:
                results.append({"question": cq["question"], "status": "query_error", "error": str(e)})

    driver.close()
    return results

The empirical check seeds a fresh Neo4j graph for each competency question using the LLM-generated cypher_dataset statement, then runs the cypher_query and classifies the outcome as fully_answered, not_answered, dataset_error, or query_error.

Metric Summary

Metric	What it measures	Ideal direction
CCC	Class connectivity (graph integration)	Higher is better
ANOnto	Annotation coverage	100% ideal
PROnto	Relational vs attribute richness	Domain-dependent
LCOMOnto	Taxonomy depth (hierarchy specialisation)	Domain-dependent
CBOOnto	Coupling to ancestor/related classes	Lower is less coupled

Run these Cypher metrics after every ontology version update to catch regressions — for example, a sudden drop in ANOnto signals that new classes were added without documentation.

Ontology-Guided KG Construction (S2)

Agents & Advanced Patterns (S2)

Season 3: LLMs, Agents & Quality

Measuring OWL Ontology Quality in Neo4j with Cypher

Watch the Recording

Session Code

Prerequisites

The Five Metrics

1. CCC — Class Connectivity Coverage

2. ANOnto — Annotation Richness

3. PROnto — Properties Richness

4. LCOMOnto — Lack of Cohesion in Methods

5. CBOOnto — Coupling Between Objects

LLM-Assisted Quality Evaluation with `cceval.py`

Empirical verification

Metric Summary

Build docs developers (and LLMs) love

Ontology-Guided KG Construction (S2)

Agents & Advanced Patterns (S2)

Season 3: LLMs, Agents & Quality

Documentation Index

Watch the Recording

Session Code

​Prerequisites

​The Five Metrics

​1. CCC — Class Connectivity Coverage

​2. ANOnto — Annotation Richness

​3. PROnto — Properties Richness

​4. LCOMOnto — Lack of Cohesion in Methods

​5. CBOOnto — Coupling Between Objects

​LLM-Assisted Quality Evaluation with cceval.py

​Empirical verification

​Metric Summary

Build docs developers (and LLMs) love

Prerequisites

The Five Metrics

1. CCC — Class Connectivity Coverage

2. ANOnto — Annotation Richness

3. PROnto — Properties Richness

4. LCOMOnto — Lack of Cohesion in Methods

5. CBOOnto — Coupling Between Objects

LLM-Assisted Quality Evaluation with `cceval.py`

Empirical verification

Metric Summary