Analysing the Data.world KG Benchmark for LLM QA

Session 26 of Going Meta, broadcast on March 5, 2024, takes a critical look at the data.world benchmark that claimed knowledge graphs improve LLM question-answering over relational data. Jesus Barrasa unpicks the methodology, reproduces the benchmark using Neo4j, and explores the role of R2RML mappings, OWL ontologies, and semantic layers in enabling the LLM to generate correct Cypher — while also highlighting what the benchmark does and doesn’t actually measure.

What You’ll Learn

How R2RML mappings translate relational schemas into graph structures for LLM consumption
How OWL ontologies provide a semantic layer that helps LLMs understand graph schema
How n10s.experimental.export.dimodel.fetch generates data integration config from an OWL ontology
How Neo4j’s self-describing schema enables dynamic context injection for LLM Cypher generation
What the data.world benchmark actually measures — and its limitations

The Benchmark Dataset

The session uses the ACME Insurance dataset from the data.world benchmark repository, which includes:

Source CSV files with insurance policies, policyholders, and agents
An OWL ontology defining the domain model (insurance.ttl)
R2RML mappings from relational tables to RDF triples

Source CSVs

Available at https://github.com/datadotworld/cwd-benchmark-data/tree/main/ACME_Insurance/data — insurance policies, agents, and policyholders.

Domain Ontology

The OWL ontology at ACME_Insurance/ontology/insurance.ttl defines Policy, PolicyHolder, Agent, and their relationships.

Step-by-Step Walkthrough

Generate a data integration model from the OWL ontology

Use n10s.experimental.export.dimodel.fetch to generate a Neo4j Workspace-compatible import configuration from a subset of the OWL classes:

CALL n10s.experimental.export.dimodel.fetch(
  "https://raw.githubusercontent.com/datadotworld/cwd-benchmark-data/main/ACME_Insurance/ontology/insurance.ttl",
  "Turtle",
  {
    classList: [
      "http://data.world/schema/insurance/Policy",
      "http://data.world/schema/insurance/PolicyHolder",
      "http://data.world/schema/insurance/Agent"
    ]
  }
);

The procedure saves the config to your local drive for use with the Neo4j import tool.

Inspect the model inline

Alternatively, use n10s.experimental.stream.dimodel.fetch to see the data integration model as a query result directly in the Neo4j browser:

CALL n10s.experimental.stream.dimodel.fetch(
  "https://raw.githubusercontent.com/datadotworld/cwd-benchmark-data/main/ACME_Insurance/ontology/insurance.ttl",
  "Turtle",
  {
    classList: [
      "http://data.world/schema/insurance/Policy",
      "http://data.world/schema/insurance/PolicyHolder",
      "http://data.world/schema/insurance/Agent"
    ]
  }
)

Import CSV data using the generated config

Load the generated config into the Neo4j import tool (Neo4j Workspace), map the CSV columns to the ontology-derived node and relationship types, and run the import job to populate the graph.

Query the populated graph

With the data loaded according to the ontology-constrained schema, run Cypher queries to verify the data and explore the model. For example, aggregate policies per agent:

MATCH (p:Policy)-[:soldByAgent]->(a:Agent)
RETURN a.agentId AS AgentID, COUNT(p) AS PoliciesSold

Expose the schema to the LLM for Cypher generation

The benchmark’s central claim is that having a semantic layer helps the LLM generate correct queries. Test this by passing the schema dynamically at query time:

// All node types and their properties
CALL db.schema.nodeTypeProperties()

// All relationship types and their properties
CALL apoc.meta.relTypeProperties()

These outputs are included in the LLM prompt alongside the natural language question from the benchmark, so the LLM can generate Cypher that matches the actual graph structure.

What the Benchmark Measures (and Doesn’t)

What It Measures

Whether providing a structured, ontology-derived schema description improves the accuracy of LLM-generated queries compared to providing a raw relational schema — a legitimate and useful experiment.

What It Doesn't Measure

The benchmark does not isolate the contribution of the graph structure itself from the quality of the schema description. A well-described relational schema might perform equally well.

R2RML's Role

R2RML mappings translate relational table definitions into RDF triples, which are then loaded into Neo4j. The semantic enrichment comes from the OWL ontology, not just the R2RML transformation.

Semantic Layer Value

The session argues that the real value is in the semantic layer — meaningful, consistent naming and ontological relationships — regardless of whether the underlying store is a graph or a relational database.

The Python notebook accompanying this session (session26/) demonstrates the full benchmark reproduction pipeline, including prompting the LLM with the benchmark questions and evaluating the generated Cypher against the expected results.

Foundations (2022)

Intermediate Topics (2022)

Advanced Patterns (2023)

LLM Integration (2023–2024)

Analysing the Data.world KG Benchmark for LLM QA

What You’ll Learn

The Benchmark Dataset

Source CSVs

Domain Ontology

Step-by-Step Walkthrough

What the Benchmark Measures (and Doesn’t)

What It Measures

What It Doesn't Measure

R2RML's Role

Semantic Layer Value

Resources

Watch the Recording

Session Code

Build docs developers (and LLMs) love

Foundations (2022)

Intermediate Topics (2022)

Advanced Patterns (2023)

LLM Integration (2023–2024)

Documentation Index

​What You’ll Learn

​The Benchmark Dataset

Source CSVs

Domain Ontology

​Step-by-Step Walkthrough

​What the Benchmark Measures (and Doesn’t)

What It Measures

What It Doesn't Measure

R2RML's Role

Semantic Layer Value

​Resources

Watch the Recording

Session Code

Build docs developers (and LLMs) love

What You’ll Learn

The Benchmark Dataset

Step-by-Step Walkthrough

What the Benchmark Measures (and Doesn’t)

Resources