Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jbarrasa/goingmeta/llms.txt

Use this file to discover all available pages before exploring further.

Season 3, Episode 7 of Going Meta introduces the concept of formalised agent skills for ontology engineering. Rather than prompting an LLM ad hoc each time you need an ontology, a skill is a reusable, versioned instruction set stored alongside the codebase that any LLM agent or IDE assistant can pick up and execute consistently. The session walks through the ontology-builder-assistant skill — a complete workflow that takes a purpose statement, competency questions, and sample data as inputs and delivers a minimal, evidence-backed OWL ontology as output.

Watch the Recording

Season 3, Episode 7 — April 2026

Session Code

Skill definition, ontology, and KG pipeline script

What Is an Agent Skill?

An agent skill is a Markdown file (SKILL.md) placed inside a .agent/skills/<skill-name>/ directory. Agents running inside tool-aware IDEs (such as Cursor or GitHub Copilot) or orchestration frameworks can discover, load, and execute these skills on demand. The skill file declares:
  • Name and description (in YAML frontmatter) — used for skill selection by the agent
  • A required workflow — an ordered sequence of steps the agent must follow
  • An output contract — the exact structure the agent must return
  • Style and decision rules — guardrails that prevent scope creep

The ontology-builder-assistant Skill

The skill is declared with a short description that helps the agent select it in the right context:
---
name: ontology-builder-assistant
description: >
  derive a minimal reusable ontology from purpose statements, competency questions,
  sample data, reusable vocabularies, supporting semantic evidence, and implementation
  constraints. use when you're asked to draft an ontology for a specific use case.
  Especially useful for ontology bootstrapping, information extraction schemas,
  mapping-oriented semantic models, and defensible cq-to-model traceability with
  final serialization.
---

The Nine-Step Workflow

1

Normalize the Inputs

Organise all available inputs into five buckets: purpose (use cases, scope boundaries, competency questions, constraints), representative data (CSV, JSON, XML, or natural-language documents), existing ontologies or vocabularies to reuse, supporting semantic evidence (glossaries, SME notes, data catalog descriptions), and implementation constraints (naming rules, reasoning profile, test expectations, quality criteria).If a bucket is missing or thin, say so explicitly before proceeding. Do not invent requirements to fill gaps.
2

Build the Requirement Gate

Create a candidate list of classes, properties, and controlled values drawn from explicit use cases, scope statements, competency questions, and implementation constraints. An ontology element is eligible only if it is required to answer at least one competency question or explicit requirement. Competency questions are the strongest filter.
3

Build the Evidence Gate

Check each candidate against the representative data. Support can be direct or near-direct evidence: repeated entities, repeated attributes, events, roles, states, measures, identifiers, dates, places, relationships, or lexical patterns. Generalise beyond literal sample mentions only when the generalisation remains supported by the data.
4

Apply Strict Inclusion and Exclusion Rules

Include an ontology element only when both conditions hold: (1) it is required by at least one explicit requirement or competency question, and (2) it is supported by the sample data. Exclude everything else, even if it appears in a reused vocabulary or seems generally useful.
5

Choose a Top-Level Grounding Scheme

Create a small set of mutually disjoint top-level classes that reduces cross-category confusion. Example schemes: person / object / location / event; party / event / clinicalFinding; asset_artifact / data_information / governance / location / measurement / party / process_event / state_condition / time. Keep the set small, declare the classes mutually disjoint, and explain why the chosen scheme fits the use case.
6

Keep the Taxonomy Shallow and Extraction-Friendly

Maximum class depth is 3. Prefer properties over deeper subclass trees. Use potential subclasses beyond level 3 as skos:example values on the parent class rather than adding new taxonomy levels. Keep names concrete and operational.
7

Reuse External Vocabularies Carefully

Reuse classes or properties from existing ontologies only when reuse helps satisfy an included requirement. Do not import large fragments that violate the minimal-scope rule. Say explicitly what was reused and what was deliberately not reused.
8

Define Classes and Properties Clearly

Provide an Aristotelian definition for every class: “an X is a Y that Z.” Definitions must help distinguish nearby concepts that might be confused during extraction or mapping. Avoid circular definitions and vague labels.
9

Validate Before Producing the Final Ontology

Run all checks in the modeling checklist before finalising. Fix any failures and re-run the checklist before proceeding to the output contract.

The Modeling Checklist

The skill references a companion checklist at references/modeling-checklist.md that acts as a pre-flight verification before the final serialisation:
## Inclusion test
Every included element must pass both tests:
1. requirement test: needed by an explicit requirement or competency question.
2. evidence test: supported by representative data.

## Depth test
- maximum taxonomy depth: 3
- prefer properties over subclasses
- add a third level only when a competency question or mapping task requires it

## Grounding test
- choose a small top-level grounding scheme
- keep top-level classes mutually disjoint
- assign each included class to one grounding branch

## Serialization test
- include only final included classes and properties
- assert top-level disjointness
- avoid speculative axioms
- add domain/range only when stable and useful

## Definition test
- every class gets an informative definition
- use Aristotelian form when possible
- avoid circular or generic wording

Output Contract

The skill mandates exactly five output sections, ensuring every ontology produced is self-documenting and traceable:
  1. CQ-to-ontology mapping — for each competency question, the classes, properties, and controlled values required to answer it, with justification
  2. Top-level disjoint class scheme — the grounding classes, the rationale, the disjointness statement, and each included class mapped to one grounding branch
  3. Class definitions — preferred label, parent class, Aristotelian definition, inclusion justification (citing requirement + data signal), and key properties
  4. Final ontology serialisation — OWL Turtle using only the included elements, with prefixes, disjointness assertions, and domain/range axioms where stable
  5. Actionable artifacts — for unstructured source data, a GraphSchema JSON file produced by the bundled scripts/owl_to_graphrag_schema.py conversion script; for structured source data, a mapping specification with column-to-class/property mappings, transformation rules, and examples

Translating the Ontology into Actionable Artifacts

Once the Turtle ontology is produced, the skill generates deployment artifacts automatically: For unstructured source data, the skill runs scripts/owl_to_graphrag_schema.py to produce a GraphSchema JSON file that a SimpleKGPipeline can consume directly:
# run_kg_pipeline.py
from neo4j_graphrag.experimental.components.schema import GraphSchema
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.embeddings import OpenAIEmbeddings
import neo4j

async def main(schema_file: str) -> None:
    schema = GraphSchema.from_file(str(ONTOLOGIES_DIR / schema_file))

    llm = OpenAILLM(
        model_name="gpt-4o",
        model_params={
            "response_format": {"type": "json_object"},
            "temperature": 0,
        },
    )
    embedder = OpenAIEmbeddings(model="text-embedding-3-small")
    driver = neo4j.GraphDatabase.driver(
        NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD), database="medical"
    )

    pipeline = SimpleKGPipeline(
        llm=llm,
        driver=driver,
        embedder=embedder,
        schema=schema,
        from_pdf=False,
        perform_entity_resolution=True,
    )

    for doc_path in sorted(DATA_DIR.glob("*.txt")):
        text = doc_path.read_text(encoding="utf-8")
        print(f"Processing {doc_path.name} ({len(text)} chars) …")
        result = await pipeline.run_async(text=text)
        print(f"  result: {result}")
For structured source data, the skill generates a mapping specification that maps columns or fields to ontology classes and properties, including transformation rules and examples.

A Worked Example: Clinical Case Sheet Ontology

The session demonstrates the skill applied to clinical case sheet documents. The resulting ontology uses a three-way top-level grounding scheme (Party, Event, ClinicalFinding) with mutually disjoint assertions:
@prefix ccs: <https://w3id.org/goingmeta/ccs#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

ccs:Party a owl:Class ;
    rdfs:label "Party" ;
    rdfs:comment "An agent or person that participates in clinical events and interactions." .

ccs:Event a owl:Class ;
    rdfs:label "Event" ;
    rdfs:comment "A temporal occurrence in a clinical context." .

ccs:ClinicalFinding a owl:Class ;
    rdfs:label "ClinicalFinding" ;
    rdfs:comment "An observation, measurement, assessment, or finding made in a clinical context about a patient." .

# Mutual disjointness of top-level classes
[] a owl:AllDisjointClasses ;
    owl:members ( ccs:Party ccs:Event ccs:ClinicalFinding ) .

ccs:Patient a owl:Class ;
    rdfs:subClassOf ccs:Party ;
    rdfs:label "Patient" ;
    rdfs:comment "A Patient is a Party who is the subject of clinical care and documentation in a case sheet." .

ccs:Encounter a owl:Class ;
    rdfs:subClassOf ccs:Event ;
    rdfs:label "Encounter" ;
    rdfs:comment "An Encounter is an Event that represents a clinical contact between a patient and a healthcare setting." .
Store skills under .agent/skills/<skill-name>/SKILL.md in your repository so that any IDE assistant or agent framework that follows the .agent convention can discover and invoke the skill by name without additional configuration.

Build docs developers (and LLMs) love