Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jbarrasa/goingmeta/llms.txt

Use this file to discover all available pages before exploring further.

Season 3, Episode 4 of Going Meta introduces a set of structural quality metrics for OWL ontologies that can be computed entirely in Cypher once an ontology has been loaded into Neo4j with n10s (neosemantics). The session adapts five metrics from the software engineering literature — originally designed for object-oriented code quality — and applies them to ontology evaluation. The result is a lightweight, repeatable quality dashboard that requires no external tooling beyond a running Neo4j instance.

Watch the Recording

Season 3, Episode 4 — January 2026

Session Code

Cypher queries and Python evaluation script

Prerequisites

Load your OWL ontology into Neo4j using n10s. After import, OWL classes are stored as (:owl__Class) nodes, datatype properties as (:owl__DatatypeProperty), object properties as (:owl__ObjectProperty), and the subclass hierarchy as [:rdfs__subClassOf] relationships. Domain and range axioms become [:rdfs__domain] and [:rdfs__range] edges.

The Five Metrics

1. CCC — Class Connectivity Coverage

Definition: The fraction of classes that participate in at least one object property relationship (as domain or range). A high CCC indicates an interconnected ontology; a low score signals many isolated classes with no relationships to other classes.
MATCH (c:owl__Class)
WITH c.uri AS c,
     size([(c)-[:rdfs__domain|rdfs__range*2]-(connected) | connected]) AS connected_count,
     size([(c)-[:rdfs__subClassOf*0..]->()-[:rdfs__domain|rdfs__range*2]-(connected) | connected]) AS connected_count_extended
RETURN avg(connected_count) AS CCC,
       avg(connected_count_extended) AS CCC_ext
CCC_ext extends the base metric by also counting connections reachable through the subclass hierarchy, capturing indirect connectivity inherited from parent classes.

2. ANOnto — Annotation Richness

Definition: The proportion of classes that carry at least one annotation property — either rdfs:label or rdfs:comment. Well-annotated ontologies are easier to understand, reuse, and maintain.
MATCH (c:owl__Class)
WITH count(c) AS c_count
MATCH (c:owl__Class) WHERE c.rdfs__comment IS NOT NULL OR c.rdfs__label IS NOT NULL
RETURN c_count, count(c) AS ac_count, count(c) * 100 / c_count AS AN_Onto
The result is expressed as a percentage: 100% means every class has at least a label or comment; lower values indicate undocumented classes.

3. PROnto — Properties Richness

Definition: The ratio of object properties to total properties (object + datatype). A higher value indicates a relationally rich ontology; a lower value indicates a more attribute-heavy model. Standard variant:
MATCH (p:owl__DatatypeProperty)
WITH count(p) AS p_count
MATCH (r:owl__ObjectProperty)
RETURN p_count, count(r) AS r_count, count(r) * 100 / (count(r) + p_count) AS PR_Onto
Class-centric variant (computes per-class attribute-to-relationship ratio then averages):
MATCH (c:owl__Class)
WITH c.uri AS c,
     size([(c)<-[:rdfs__domain]-(dtp:owl__DatatypeProperty) | dtp]) AS p_count,
     size([(c)<-[:rdfs__domain]-(op:owl__ObjectProperty) | op]) AS r_count
WITH c, p_count, r_count,
CASE p_count + r_count
  WHEN 0 THEN 0
  ELSE p_count * 100 / (p_count + r_count)
END AS local_pr_onto
RETURN avg(local_pr_onto) AS PR_Onto

4. LCOMOnto — Lack of Cohesion in Methods

Definition: The mean path length from leaf classes to the top of the hierarchy. Adapted from the software engineering LCOMOnto metric, it captures taxonomy depth — a proxy for how specialised and hierarchically organised the ontology is.
MATCH path = (leaf:owl__Class)-[:rdfs__subClassOf*0..]->(top)
WHERE NOT ()-[:rdfs__subClassOf]->(leaf)
  AND NOT (top)-[:rdfs__subClassOf]->()
WITH length(path) + 1 AS path_length
RETURN sum(path_length), count(*), sum(path_length) / count(*) AS LCOMOnto
A higher LCOMOnto value means leaf classes are buried deeper in the hierarchy (more specialised). A value near 1 indicates a flat taxonomy.

5. CBOOnto — Coupling Between Objects

Definition: The average number of ancestor classes per class. Heavily adapted from the object-oriented CBO metric, it measures how much a class inherits from (or is otherwise coupled to) other classes in the ontology. Base variant (subclass coupling only):
MATCH (c:owl__Class)
WITH c.uri AS c,
     size([(c)-[:rdfs__subClassOf]->(parent) | parent]) AS ancestor_count
RETURN avg(ancestor_count) AS CBOOnto
Extended variant (with zero-safe normalisation):
MATCH (c:owl__Class)
WITH c.uri AS c, size([(c)-[:rdfs__subClassOf]->(parent) | parent]) AS ancestor_count
WITH c,
     CASE ancestor_count
       WHEN 0 THEN 1
       ELSE ancestor_count
     END AS ancestor_count
RETURN avg(ancestor_count) AS CBOOnto
Full variant (including non-taxonomic coupling through object property domain/range):
MATCH (c:owl__Class)
WITH c.uri AS c,
     size([(c)-[:rdfs__subClassOf]->(parent) | parent]) AS ancestor_count,
     size([(c)-[:rdfs__domain|rdfs__range*2]-(related) | related]) AS related_count,
     size([(c)-[:rdfs__subClassOf*0..]->()-[:rdfs__domain|rdfs__range*2]-(related) | related]) AS related_count_extended
RETURN avg(ancestor_count) AS CBOOnto,
       avg(ancestor_count + related_count) AS CBOOnto_1,
       avg(ancestor_count + related_count_extended) AS CBOOnto_2

LLM-Assisted Quality Evaluation with cceval.py

Beyond structural metrics, the session introduces cceval.py, which uses an LLM to evaluate an ontology against a set of competency questions (CQs) — the stated requirements the ontology was designed to satisfy. The evaluator scores each CQ from 0 to 1 and generates Cypher seed datasets and queries for empirical verification.
def evaluate_ontology_against_cq(
    ontology_text: str,
    questions: List[str],
    model: str = None,
) -> dict:
    client = OpenAI()
    model = model or os.getenv("OPENAI_MODEL", "gpt-4o-mini-2024-07-18")

    schema = {
        "name": "CQEval",
        "strict": True,
        "schema": {
            "type": "object",
            "properties": {
                "overall_score": {"type": "number", "minimum": 0, "maximum": 1},
                "per_cq": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "question":       {"type": "string"},
                            "score":          {"type": "number", "minimum": 0, "maximum": 1},
                            "reasoning":      {"type": "string"},
                            "suggestions":    {"type": "array", "items": {"type": "string"}},
                            "cypher_dataset": {"type": "string"},
                            "cypher_query":   {"type": "string"},
                        },
                    }
                },
                "global_suggestions": {"type": "array", "items": {"type": "string"}}
            },
        }
    }
    # ... (LLM call with structured output)
The LLM scores each CQ using this rubric:
ScoreMeaning
1.0Ontology clearly models the required classes/properties/relationships; a Cypher query can answer the CQ
0.5Partially modelled; answerable only after minor refactoring
0.0Not supported — critical modeling gaps

Empirical verification

The evaluator also runs the LLM-generated Cypher against a real Neo4j instance to confirm the model-level score holds empirically:
def run_empirical_check(
    cq_result: dict,
    neo4j_uri: str      = "<<DB ENDPOINT>>",
    neo4j_user: str     = "<<USER>>",
    neo4j_password: str = "<<PWD>>",
) -> list:
    driver = GraphDatabase.driver(neo4j_uri, auth=(neo4j_user, neo4j_password))
    results = []

    with driver.session() as session:
        for cq in cq_result["per_cq"]:
            session.run("MATCH (n) DETACH DELETE n")

            try:
                session.run(cq["cypher_dataset"])
            except Exception as e:
                results.append({"question": cq["question"], "status": "dataset_error", "error": str(e)})
                continue

            try:
                rows = session.run(cq["cypher_query"]).data()
                results.append({
                    "question":     cq["question"],
                    "llm_score":    cq["score"],
                    "status":       "fully_answered" if rows else "not_answered",
                    "result_count": len(rows),
                })
            except Exception as e:
                results.append({"question": cq["question"], "status": "query_error", "error": str(e)})

    driver.close()
    return results
The empirical check seeds a fresh Neo4j graph for each competency question using the LLM-generated cypher_dataset statement, then runs the cypher_query and classifies the outcome as fully_answered, not_answered, dataset_error, or query_error.

Metric Summary

MetricWhat it measuresIdeal direction
CCCClass connectivity (graph integration)Higher is better
ANOntoAnnotation coverage100% ideal
PROntoRelational vs attribute richnessDomain-dependent
LCOMOntoTaxonomy depth (hierarchy specialisation)Domain-dependent
CBOOntoCoupling to ancestor/related classesLower is less coupled
Run these Cypher metrics after every ontology version update to catch regressions — for example, a sudden drop in ANOnto signals that new classes were added without documentation.

Build docs developers (and LLMs) love