Do Ontologies Really Solve the LLM Ambiguity Problem?

Season 3, Episode 3 of Going Meta confronts a fundamental challenge in LLM-based knowledge graph construction: word ambiguity. The “Jaguar Problem” is named after the classic disambiguation challenge where the word jaguar could refer to the big cat, the British luxury automobile, or the NFL franchise. When an LLM extracts entities and relationships from text without semantic grounding, it may silently mix concepts from different domains into a single coherent-looking but semantically incorrect graph. This session tests whether providing a domain ontology solves the problem.

Watch the Recording

Season 3, Episode 3 — December 2025

Session Code

Python scripts and test results

The Experiment Design

The session uses a corpus of text about jaguars — the animal — and a purpose-built OWL ontology that models the wildlife biology domain: individual jaguar animals, their observations, monitoring organisations, offspring, geographic regions, and so on. Two extraction strategies are compared across nine independent iterations:

Test ID	Ontology format fed to the LLM
`owl_onto`	Raw OWL Turtle source
`nl_onto`	Natural-language summary generated from the ontology

For each strategy the LLM (GPT-5) is asked to extract entities and relationships from the corpus and return them as RDF Turtle triples aligned with the ontology vocabulary. The output is then validated against a suite of SPARQL ASK queries that check specific factual claims.

Extraction Prompts

Both test cases share the same extraction instruction; only the ontology representation differs:

tests = [
    {
        "test_id": "owl_onto",
        "prompt_body": """Extract relevant named entities, their relations and related information from this text.
Think deep and analyze all information in the relevant text thoroughly.
Try to infer relevant relationships between entities if not directly mentioned in the text.
Return the results as RDF triples using Turtle serialisation that align with the ontology for the found entities and relationships.
Make sure to give all entities relevant rdfs:label. Use the namespace 'http://example.org/resource#' for extracted entities.""",
        "onto_prefix": "##ONTOLOGY: ",
        "ontology": ontology,        # raw OWL Turtle
        "corpus_prefix": "##TEXT: ",
        "corpus": corpus,
    },
    {
        "test_id": "nl_onto",
        "prompt_body": """Extract relevant named entities, their relations and related information from this text.
Think deep and analyze all information in the relevant text thoroughly.
Try to infer relevant relationships between entities if not directly mentioned in the text.
Return the results as RDF triples using Turtle serialisation that align with the ontology for the found entities and relationships.
Make sure to give all entities relevant rdfs:label. Use the namespace 'http://example.org/resource#' for extracted entities
and 'http://example.org/ontology#' for the vocabulary terms.""",
        "onto_prefix": "##ONTOLOGY: ",
        "ontology": getNLOntology(ontology),  # NL summary
        "corpus_prefix": "##TEXT: ",
        "corpus": corpus,
    },
]

The `getNLOntology` Utility

The getNLOntology function in utils.py converts an OWL ontology into a structured natural-language summary that is easier for an LLM to consume than raw Turtle syntax. It produces category lists with subclass relationships and attribute listings, followed by a relationship section:

def getNLOntology(text):
    g = Graph()
    g.parse(data=text)

    result = ''
    definedcats = []

    # Build a mapping: class -> list of datatype properties (attributes)
    class_to_attributes = {}
    for att in g.subjects(RDF.type, OWL.DatatypeProperty):
        for dom in g.objects(att, RDFS.domain):
            class_to_attributes.setdefault(dom, []).append(att)

    result += '### CATEGORIES\n'

    for cat in g.subjects(RDF.type, OWL.Class):
        label = getLocalPart(cat)
        supercats = [getLocalPart(s) for s in g.objects(cat, RDFS.subClassOf)]
        descs = [str(d) for d in g.objects(cat, RDFS.comment)]

        if supercats:
            result += f"- {label} (subcategory of {', '.join(supercats)})\n"
        else:
            result += f"- {label}\n"

        if descs:
            result += f"   - Description: {' '.join(descs)}\n"

        attrs = class_to_attributes.get(cat, [])
        if attrs:
            result += f"   - Attributes:\n"
            for att in attrs:
                att_label = getLocalPart(att)
                att_descs = [str(d) for d in g.objects(att, RDFS.comment)]
                att_desc = ' '.join(att_descs)
                result += f"        + {att_label}: {att_desc}\n"

    result += '\n### RELATIONSHIPS:\n'
    for prop in g.subjects(RDF.type, OWL.ObjectProperty):
        prop_label = getLocalPart(prop)
        doms = [getLocalPart(d) for d in g.objects(prop, RDFS.domain)]
        rans = [getLocalPart(r) for r in g.objects(prop, RDFS.range)]
        descs = [str(d) for d in g.objects(prop, RDFS.comment)]

        line = f"- {prop_label}: Relationship"
        if doms:
            line += f" that connects entities of type {', '.join(doms)}"
        if rans:
            line += f" to entities of type {', '.join(rans)}"
        if descs:
            line += f". Description: {' '.join(descs)}"
        result += line + "\n"

    return result

Validation with SPARQL ASK Queries

The tests.md file defines a comprehensive suite of SPARQL ASK queries that serve as ground truth. Each query tests a specific factual claim that should be extractable from the corpus. A few representative checks:

Count of named individuals

PREFIX onto: <http://example.org/ontology#>
ASK {
  { SELECT (COUNT(DISTINCT ?needle) AS ?c) WHERE {
      VALUES ?needle {
          "el jefe" "macho b" "sombra" "oko" "cochise"
          "kudam" "mariposa" "xam" "isa" "fera" "amanaci"
          "ben" "f11" "pixana" "levantina"  "mariua"
      }
      ?u a onto:Jaguar ; rdfs:label ?n .
      FILTER(CONTAINS(LCASE(STR(?n)), ?needle))
  } }
  FILTER(?c = 16)
}

Individual properties (El Jefe)

PREFIX onto: <http://example.org/ontology#>
ASK
{ ?jefe a onto:Jaguar ; rdfs:label ?jname ; onto:hasGender "Male" ;
      onto:hasLastSightingDate "2021-11-27"^^xsd:date;
      onto:hasMonitoringStartDate "2011-11-19"^^xsd:date .
  FILTER CONTAINS(LCASE(STR(?jname)), "el jefe")
}

Monitoring organisation types

PREFIX onto: <http://example.org/ontology#>
ASK
{ ?jefe a onto:Jaguar ; rdfs:label ?jname .
  FILTER CONTAINS(LCASE(STR(?jname)), "el jefe") .
  ?jefe onto:monitoredByOrg [ rdfs:label ?orgName1 ; a onto:NGO ] ;
         onto:monitoredByOrg [ rdfs:label ?orgName2 ; a onto:GovernmentAgency ] ;
         onto:monitoredByOrg [ rdfs:label ?orgName3 ; a onto:AcademicInstitution ] .
  FILTER CONTAINS(LCASE(STR(?orgName1)), "conservation catalyst") .
  FILTER CONTAINS(LCASE(STR(?orgName2)), "arizona game and fish department") .
  FILTER CONTAINS(LCASE(STR(?orgName3)), "university of arizona") .
}

Offspring lineage

PREFIX onto: <http://example.org/ontology#>
ASK
{ ?m a onto:Jaguar ; rdfs:label ?jname ; onto:occursIn ?p .
  FILTER CONTAINS(LCASE(STR(?jname)), "mariposa") .
  ?m onto:hasOffspring [ a onto:Jaguar; rdfs:label ?oname ; onto:occursIn ?p ] .
  FILTER CONTAINS(LCASE(STR(?oname)), "cayenita") .
}

Result Validation Helper

After each LLM call the extracted Turtle is parsed and basic statistics are computed:

def processResults(rdf):
    g = Graph()
    g.parse(data=rdf, format="turtle")
    jaguarcount = 0
    for j in g.subjects(RDF.type, URIRef("http://example.org/ontology#Jaguar")):
        jaguarcount += 1
    print("Triples:", len(g), "Jaguars: ", jaguarcount)

The test runner executes both strategies over nine iterations and writes each LLM response to a .ttl file for offline SPARQL validation:

for iteration in range(1, 10):
    for t in tests:
        prompt = f"""{t["prompt_body"]}\n\n{t["onto_prefix"]}\n\n{t["ontology"]}\n\n{t["corpus_prefix"]}\n\n{t["corpus"]}"""

        response = client.chat.completions.create(
            model="gpt-5",
            messages=[{"role": "user", "content": prompt}],
            max_completion_tokens=30000
        )

        with open("ontoproject/output/" + t["test_id"] + "__" + str(iteration) + ".ttl", "w", encoding="utf-8") as f:
            f.write(response.choices[0].message.content)

        print(processResults(response.choices[0].message.content))

The ontology used in the experiment — jaguar_ontology.ttl — is hosted at https://raw.githubusercontent.com/nemegrod/graph_RAG/refs/heads/main/data/jaguar_ontology.ttl. The corpus is the companion jaguar_corpus.txt from the same repository.

Key Finding

The experiment shows that providing an ontology — in either raw OWL or natural-language form — significantly constrains the LLM’s output to the wildlife domain, eliminating cross-domain confusion between jaguar-the-animal and other senses of the word. The natural-language summary (nl_onto) generally produces more consistent results because the LLM does not need to parse Turtle syntax while simultaneously performing entity extraction.

When grounding LLM extractions with an ontology, prefer the natural-language serialisation format (getNLOntology) for smaller context windows and better extraction consistency. Reserve raw Turtle for cases where strict namespace alignment is critical.

Ontology-Guided KG Construction (S2)

Agents & Advanced Patterns (S2)

Season 3: LLMs, Agents & Quality

Do Ontologies Really Solve the LLM Ambiguity Problem?

Watch the Recording

Session Code

The Experiment Design

Extraction Prompts

The `getNLOntology` Utility

Validation with SPARQL ASK Queries

Count of named individuals

Individual properties (El Jefe)

Monitoring organisation types

Offspring lineage

Result Validation Helper

Key Finding

Build docs developers (and LLMs) love

Ontology-Guided KG Construction (S2)

Agents & Advanced Patterns (S2)

Season 3: LLMs, Agents & Quality

Documentation Index

Watch the Recording

Session Code

​The Experiment Design

​Extraction Prompts

​The getNLOntology Utility

​Validation with SPARQL ASK Queries

​Count of named individuals

​Individual properties (El Jefe)

​Monitoring organisation types

​Offspring lineage

​Result Validation Helper

​Key Finding

Build docs developers (and LLMs) love

The Experiment Design

Extraction Prompts

The `getNLOntology` Utility

Validation with SPARQL ASK Queries

Count of named individuals

Individual properties (El Jefe)

Monitoring organisation types

Offspring lineage

Result Validation Helper

Key Finding