Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jbarrasa/goingmeta/llms.txt

Use this file to discover all available pages before exploring further.

Session 13 of Going Meta (broadcast February 9, 2023) explores a powerful pattern for working with data that lives outside Neo4j: instead of physically moving records into the graph database, you define virtual graphs that pull data from an external PostgreSQL time-series store on demand using APOC’s dynamic virtual resource API. The session then shows how to serialize the resulting hybrid query results as RDF through the Neosemantics HTTP endpoint.

Watch Recording

Full session recording on YouTube

Source Code

Cypher scripts and Python data generator

Overview

BroadcastFebruary 9, 2023
TagsSQL APOC RDF Python
Key proceduresapoc.dv.catalog.add, apoc.dv.query, apoc.dv.queryAndLink

What You Will Learn

  • Loading a network topology dataset (devices and links) into Neo4j from CSV
  • Defining APOC dynamic virtual resources (apoc.dv.catalog.add) that proxy JDBC queries
  • Querying external time-series data at runtime with apoc.dv.query and fusing results with Cypher
  • Materialising virtual relationships between graph nodes and live records with apoc.dv.queryAndLink
  • Generating synthetic telemetry data from Python to populate the external SQL database
  • Serialising hybrid query results as RDF via the Neosemantics Cypher endpoint

Architecture

The topology graph (devices, links, geographic positions) lives in Neo4j. Telemetry readings live in a PostgreSQL sensor_telemetry table. APOC virtual resources act as the bridge: they hold the JDBC connection string and parameterised SQL query, and are invoked at query time from Cypher so no data is duplicated.

Step-by-Step Walkthrough

1

Load the network topology

Import devices and links from CSV, then set an integer external_id on each device to match the primary key used in the time-series database.
// Create devices
LOAD CSV WITH HEADERS
FROM "https://raw.githubusercontent.com/jbarrasa/goingmeta/main/session13/data/devices.csv"
AS row
CREATE (d:Device) SET d = row

// Add references to link with telemetry data in time series DB
MATCH (d:Device)
SET d.external_id = toInteger(substring(d.neId, 1))

// Create links
LOAD CSV WITH HEADERS
FROM "https://raw.githubusercontent.com/jbarrasa/goingmeta/main/session13/data/links.csv"
AS row
MATCH (from:Device { neId: row.from })
MATCH (to:Device   { neId: row.to })
MERGE (from)-[l:link]->(to)
SET l.capacity = row.capacity, l.linkId = row.linkId
2

Register virtual resources in the APOC catalog

Each virtual resource definition stores the JDBC URL, result labels, and a parameterised SQL query. Parameters like $deviceid are passed at query time from Cypher.
CALL apoc.dv.catalog.add("all-metrics-10s", {
  type: "JDBC",
  url: "jdbc:postgresql://localhost/jb?user=jb&password=jb",
  labels: ["RawMetric"],
  query: "select * from sensor_telemetry st
          where st.device_id = $deviceid
          and datetime > now() - INTERVAL '10 seconds';",
  desc: "last 10 seconds of all metrics by device Id"
})

CALL apoc.dv.catalog.add("named-metric-10s", {
  type: "JDBC",
  url: "jdbc:postgresql://localhost/jb?user=jb&password=jb",
  labels: ["RawMetric"],
  query: "select * from sensor_telemetry st
          where st.device_id = $deviceid and st.metric = $metricname
          and st.datetime > now() - INTERVAL '10 seconds';",
  desc: "last 10 seconds of a specific metric by device Id"
})
Virtual resource definitions are stored in the APOC catalog and persist across sessions. You only need to register them once per database.
3

Query external data and aggregate in Cypher

Combine graph properties (location, device code) with aggregated telemetry fetched on demand from the SQL store — all in a single Cypher statement.
MATCH (d:Device { name: "Versailles" })
CALL apoc.dv.query("all-metrics-10s", { deviceid: d.external_id })
YIELD node AS reading
WITH d, apoc.any.properties(reading) AS reading
RETURN
  d.neId        AS deviceId,
  d.code        AS deviceCode,
  d.lat         AS latitude,
  d.long        AS longitude,
  reading.metric   AS metric,
  avg(reading.reading) AS avg_value,
  min(reading.reading) AS min_value,
  max(reading.reading) AS max_value,
  min(reading.datetime) AS from,
  max(reading.datetime) AS to
LIMIT 100
4

Materialise virtual relationships with queryAndLink

apoc.dv.queryAndLink fetches the external data rows and attaches them to the graph node as virtual relationships, returning a traversable path without persisting anything to disk.
MATCH (d:Device) WHERE d.name = "Orly Nord"
CALL apoc.dv.queryAndLink(
  d, "has_metric",
  "named-metric-10s",
  { deviceid: d.external_id, metricname: "M-5" }
)
YIELD path
RETURN *
5

Register an aggregating virtual resource

A more sophisticated virtual resource computes aggregations inside PostgreSQL, reducing the data transferred to Neo4j.
CALL apoc.dv.catalog.add("agg-metrics-30s", {
  type: "JDBC",
  url: "jdbc:postgresql://localhost/jb?user=jb&password=jb",
  labels: ["AggregateMetric"],
  query: "select metric,
            count(*) as reading_count,
            avg(reading) as avg_reading,
            min(reading) as min_reading,
            max(reading) as max_reading,
            min(datetime) as from_time,
            max(datetime) as to_time
          from sensor_telemetry st
          where st.device_id = $deviceid
            and st.datetime > now() - INTERVAL '30 seconds'
          group by device_id, metric;",
  desc: "last 10 seconds of aggregated metrics by device Id"
})

// Iterate all devices and retrieve live aggregates
// Collect links first to avoid duplicate requests per relationship
MATCH (d:Device)-[l:link]-()
WITH d, collect(l) AS links
CALL apoc.dv.queryAndLink(
  d, "has_aggregate_metric",
  "agg-metrics-30s",
  { deviceid: d.external_id }
)
YIELD path
RETURN *
6

Serialize as RDF with Neosemantics

The Neosemantics /rdf/neo4j/cypher HTTP endpoint accepts any Cypher query and returns the matched subgraph as RDF. Use single quotes inside the JSON payload to avoid escaping conflicts.
POST http://localhost:7474/rdf/neo4j/cypher

{
  "cypher": "MATCH (d:Device) WHERE d.name = 'Orly Nord'
             CALL apoc.dv.queryAndLink(d,'has_metric','named-metric-10s',
                   { deviceid: d.external_id, metricname: 'M-5'})
             YIELD path RETURN *"
}

Generating Synthetic Telemetry (Python)

The session uses a Python script to continuously populate the sensor_telemetry PostgreSQL table with random readings across 14 devices and 6 metric types, simulating a live network monitoring environment.
import psycopg2
from datetime import datetime
import decimal, time, random

def write_to_db(data):
    try:
        connection = psycopg2.connect(
            user="jb", password="jb",
            host="127.0.0.1", port="5432",
            database="jb"
        )
        cursor = connection.cursor()
        insert_query = """
            INSERT INTO sensor_telemetry (datetime, device_id, metric, reading)
            VALUES (%s, %s, %s, %s)
        """
        cursor.execute(insert_query, (
            datetime.now(),
            data.get("device"),
            data.get("metric"),
            data.get("reading")
        ))
        connection.commit()
    except (Exception, psycopg2.Error) as error:
        print("PROBLEMS: ", error)
    finally:
        if connection:
            cursor.close()
            connection.close()

while True:
    for x in range(0, 14):
        for y in range(0, 6):
            write_to_db({
                "device":  x,
                "metric":  'M-' + str(y),
                "reading": decimal.Decimal(random.randrange(10, 1389))
            })
    time.sleep(5)
Run the Python generator in a separate terminal before executing any of the Cypher queries that use the all-metrics-10s or named-metric-10s virtual resources, so the time-series table has data to return.

Key Concepts

Virtual graphs — Data is never moved into Neo4j. Instead, APOC catalog entries act as named, parameterised proxies for external queries. The graph database retains only the structural topology, while all metric values remain in the time-series store. Hybrid queries — Because APOC virtual resources integrate with standard Cypher, you can combine graph traversal (MATCH, WHERE, path patterns) with live external lookups (apoc.dv.query) in a single query — no middleware required. RDF serialisation — Neosemantics can serialise any Cypher result as RDF on the fly. Combined with virtual graphs, this means you can expose live telemetry data as Linked Data without a permanent RDF store.
APOC virtual resources require APOC Enterprise or APOC Extended. Check the APOC documentation for the apoc.dv.* procedure availability in your Neo4j version.

Resources

APOC Virtual Resources

APOC documentation for dynamic virtual resources

Neosemantics (n10s)

RDF and Linked Data integration for Neo4j

R2RML Standard

W3C mapping language for relational databases to RDF

psycopg2 Docs

PostgreSQL adapter for Python

Build docs developers (and LLMs) love