Generate Natural Language from KGs via Annotated Ontologies
Learn to annotate OWL ontologies with lexical metadata and run a Cypher NLG engine that turns Neo4j graph data into readable natural language sentences.
Use this file to discover all available pages before exploring further.
Session 7 of Going Meta, broadcast on August 2, 2022, demonstrates a surprisingly elegant idea: if you annotate your ontology with linguistic patterns — verb phrases, speech-style variants, and language tags — a compact Cypher query can walk your graph and assemble those patterns into full natural language sentences. No dedicated NLG library required; the ontology itself carries the grammar.
Start by matching a node and finding which ontology property covers the relationship, then read the right directional predicate.
MATCH (n)-[r]-(o) WHERE id(n) = 17MATCH (cn:Class)<-[:domain|range]-(op:ObjectProperty)-[:domain|range]->(co:Class)WHERE type(r) IN op.label AND (exists(op.direct) OR exists(op.inverse)) AND cn.label[0] IN labels(n) AND co.label[0] IN labels(o)RETURN n[cn.name[0]] AS subj, op[CASE WHEN startNode(r) = n THEN "direct" ELSE "inverse" END] AS pred, o[co.name[0]] AS obj
2
Filter by speech style
Use n10s.rdf.getLangValue to pick the variant matching the requested style tag.
MATCH (n)-[r]-(o) WHERE id(n) = 17MATCH (cn:Class)<-[:domain|range]-(op:ObjectProperty)-[:domain|range]->(co:Class)WHERE type(r) IN op.label AND (exists(op.direct) OR exists(op.inverse)) AND cn.label[0] IN labels(n) AND co.label[0] IN labels(o)RETURN n[cn.name[0]] AS subj, n10s.rdf.getLangValue("default", op[CASE WHEN startNode(r) = n THEN "direct" ELSE "inverse" END]) AS pred, o[co.name[0]] AS obj
3
Aggregate multi-value predicates into a sentence
Collect all objects for the same predicate into a comma-joined string, then concatenate subject + predicate + object list.
MATCH (n)-[r]-(o) WHERE id(n) = 63MATCH (cn:Class)<-[:domain|range]-(op:ObjectProperty)-[:domain|range]->(co:Class)WHERE type(r) IN op.label AND (exists(op.direct) OR exists(op.inverse)) AND cn.label[0] IN labels(n) AND co.label[0] IN labels(o)WITH n[cn.name[0]] AS subj, n10s.rdf.getLangValue("default", op[CASE WHEN startNode(r) = n THEN "direct" ELSE "inverse" END]) AS pred, collect(o[co.name[0]]) AS objRETURN subj + " " + pred + " " + substring(reduce(r="", x IN obj | r+","+x),1)
The full engine handles both datatype properties (inline templates with $s/$o placeholders) and object properties (relationship traversal) in a single parameterized CALL { … UNION … } block.
// set params: :params {node_id: <id>, speech_style: "default"}CALL { MATCH (n) WHERE id(n) = $node_id MATCH (cn:Class)<-[:domain]-(op:DatatypeProperty) WHERE op.label[0] IN keys(n) AND (exists(op.direct)) AND [x IN labels(n) WHERE x <> "Resource"][0] IN cn.label WITH n[cn.name[0]] AS subj, n10s.rdf.getLangValue($speech_style, op.direct) AS pred, n[op.label[0]] AS obj WITH CASE WHEN pred CONTAINS '$s' THEN '' ELSE subj END AS subj, replace(replace(pred,'$s',toString(subj)),'$o',toString(obj)) AS pred, CASE WHEN pred CONTAINS '$o' THEN '' ELSE obj END AS obj RETURN subj + " " + pred + " " + obj AS sentenceUNION MATCH (n)-[r]-(o) WHERE id(n) = $node_id MATCH (cn:Class)<-[:domain|range]-(op:ObjectProperty)-[:domain|range]->(co:Class) WHERE type(r) IN op.label AND (exists(op.direct) OR exists(op.inverse)) AND [x IN labels(n) WHERE x <> "Resource"][0] IN cn.label AND [x IN labels(o) WHERE x <> "Resource"][0] IN co.label WITH n[cn.name[0]] AS subj, n10s.rdf.getLangValue($speech_style, op[CASE WHEN startNode(r) = n THEN "direct" ELSE "inverse" END]) AS pred, substring(reduce(result="", x IN collect(o[co.name[0]]) | result+","+x),1) AS obj WITH CASE WHEN pred CONTAINS '$s' THEN '' ELSE subj END AS subj, replace(replace(pred,'$o',obj),'$s',subj) AS pred, CASE WHEN pred CONTAINS '$o' THEN '' ELSE obj END AS obj RETURN subj + " " + pred + " " + obj AS sentence}RETURN DISTINCT sentence
Change $speech_style to "short" or "long" to instantly switch the verbosity of all generated sentences without touching any graph data.