Session 32 (Season 2, Episode 5 — January 2025) tackles the practical challenge that appears the moment you try to build a production knowledge graph: your data never comes from a single source or in a single format. This session demonstrates how one well-designed OWL ontology can act as the integration hub for PDFs, CSVs, and CRM data simultaneously — producing a unified Neo4j graph that is immediately ready for GraphRAG retrieval. Two Python utilities do the heavy lifting:Documentation Index
Fetch the complete documentation index at: https://mintlify.com/jbarrasa/goingmeta/llms.txt
Use this file to discover all available pages before exploring further.
DIModelBuilder generates Neo4j Data Importer models from the OWL, and RAGSchemaFromOnto converts the same ontology into the neo4j-graphrag schema format.
Watch the Recording
Full live-stream replay on YouTube
Session Code
Python: RAGSchemaFromOnto.py and DIModelBuilder.py
The Integration Challenge
When building a KG from a single domain — say, legal contracts — a single ontology and a single pipeline is enough. Real-world scenarios typically involve:- PDF documents (contracts, reports, product specs)
- Structured tabular data (CSV exports from CRMs, policy databases)
- Semi-structured API responses (sales opportunity data, customer records)
insurance.ttl / sales-onto.ttl) as the unifying schema. Every data source — regardless of format — is loaded in a way that respects the ontology’s class and property definitions.
Project Dependencies
The session code is packaged with apyproject.toml. The four runtime dependencies are:
streamlit powers an interactive demo UI; rdflib handles OWL parsing; requests fetches ontology files from a local HTTP server; and neo4j-graphrag provides the SimpleKGPipeline and schema objects.
RAGSchemaFromOnto.py — Ontology to GraphRAG Schema
RAGSchemaFromOnto.py provides the same getSchemaFromOnto() function as Session 31, but with a key extension: it accepts a file path rather than a pre-loaded Graph, making it convenient to call with different ontology files without managing the RDFLib graph lifecycle externally.
Core Conversion Function
getPKs() — Identifying Natural Keys
Properties declared as owl:InverseFunctionalProperty in the ontology are treated as unique identifiers — the OWL equivalent of a primary key:
Marking a property as
owl:InverseFunctionalProperty in your ontology is a deliberate design signal: “this property value uniquely identifies the subject.” getPKs() surfaces these so the ingestion pipeline can use them as MERGE keys, preventing duplicate nodes when the same entity appears in multiple source documents.DIModelBuilder.py — Ontology to Data Importer Model
DIModelBuilder goes in a complementary direction: it converts the OWL ontology into the JSON format consumed by the Neo4j Data Importer, enabling visual, no-code loading of structured (CSV/tabular) data that conforms to the ontology schema.
Class Overview
MAX_NUM_NODES and MAX_NUM_RELS guards prevent building an unwieldy import model from very large ontologies — a practical safeguard when working with foundational ontologies like schema.org.
Building the Import Model
build_di_model() is the entry point. It accepts raw RDF data (as a string), its format, and an optional classList to limit which ontology classes are included:
rdfs:subClassOf* hierarchy, so even classes defined as subclasses of the selected ones are included in the import model.
Exporting for Neo4j Data Importer v2
Theget_model_as_serialisable_object_v2() method produces a JSON document that the Neo4j Data Importer v2 can open directly:
Running the Builder
The Integration Architecture
PDFs via LLM extraction
Unstructured PDF documents are processed through
SimpleKGPipeline using the schema produced by getSchemaFromOnto(), writing extracted entities directly to Neo4j.CSVs via Data Importer
Structured tabular data is loaded through the Neo4j Data Importer using the model produced by
DIModelBuilder, mapping CSV columns to ontology properties.Single ontology as truth
Both ingestion paths share the same OWL ontology — the LLM extraction schema and the Data Importer model are both derived from it, guaranteeing a consistent node/relationship vocabulary.
GraphRAG-ready output
The resulting unified graph can immediately be queried with
neo4j-graphrag retrieval components, since the schema objects are derived from the same ontology.