Neo4j as Persistent Agent Memory for LLM Applications

Season 3, Episode 5 of Going Meta explores how Neo4j can serve as a persistent, queryable memory store for LLM agents. In-memory agent state is lost when a session ends. Storing conversation history, entity facts, preferences, and reasoning traces in Neo4j gives agents durable recall across sessions — and makes memory itself a first-class graph that can be queried, audited, and built upon over time.

Watch the Recording

Season 3, Episode 5 — February 2026

Session Code

Colab notebook: GM_S3_5_AgentMemory.ipynb

The `neo4j-agent-memory` Library

The session uses the neo4j-agent-memory Python library, which provides a structured MemoryClient abstraction over Neo4j. The client exposes three distinct memory types — short-term, long-term, and reasoning — each backed by a different graph pattern in Neo4j.

Setup

from neo4j_agent_memory import MemoryClient, MemorySettings, Neo4jConfig, ToolCallStatus
from pydantic import SecretStr

settings = MemorySettings(
    neo4j=Neo4jConfig(
        uri=NEO4J_URL,
        username=NEO4J_USR,
        password=SecretStr(NEO4J_PWD),
    )
)

Install with:

pip install neo4j-agent-memory[all] gliner

Short-Term Memory: Conversation History

Short-term memory stores the turn-by-turn conversation for a given session. Messages are written with a session_id and a role, and can be retrieved or searched later.

Storing messages

async with MemoryClient(settings) as memory:
    await memory.short_term.add_message(
        session_id="user-123",
        role="user",
        content="I'm looking for a restaurant"
    )
    await memory.short_term.add_message(
        session_id="user-123",
        role="assistant",
        content="That's great, what's your favourite cuisine?"
    )
    await memory.short_term.add_message(
        session_id="user-123",
        role="user",
        content="I love Mediterranean cuisine"
    )

Retrieving and searching

async with MemoryClient(settings) as memory:
    conversation = await memory.short_term.get_conversation("user-123")
    for msg in conversation.messages:
        print(f"{msg.role}: {msg.content}")

    results = await memory.short_term.search_messages("restaurants")
    for msg in results:
        print(f"{msg.role}: {msg.content}")

Generating conversation summaries

The library can produce a lightweight structural summary (no LLM required) or delegate to a custom LLM summariser:

async with MemoryClient(settings) as memory:
    # Structural summary — no LLM
    summary = await memory.short_term.get_conversation_summary("user-123")
    print(summary.summary)
    print(f"Messages: {summary.message_count}")
    print(f"Key entities: {summary.key_entities}")

    # LLM-powered summary
    async def my_summarizer(transcript: str) -> str:
        response = await openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "Summarize this conversation concisely."},
                {"role": "user", "content": transcript}
            ]
        )
        return response.choices[0].message.content

    summary = await memory.short_term.get_conversation_summary(
        "user-123",
        summarizer=my_summarizer,
        include_entities=True,
    )

Long-Term Memory: Entities, Facts, and Preferences

Long-term memory stores structured knowledge about the user or domain across sessions. It supports three sub-types: entities, facts (subject-predicate-object with optional temporal validity), and preferences.

Adding entities and preferences

async with MemoryClient(settings) as memory:
    entity = await memory.long_term.add_entity(
        name="John Smith",
        entity_type="PERSON",      # POLE+O type
        subtype="INDIVIDUAL",      # Optional subtype
        description="A customer who loves Italian food"
    )

    pref = await memory.long_term.add_preference(
        category="food",
        preference="Prefers vegetarian options",
        context="When dining out"
    )

Adding temporally scoped facts

from datetime import datetime

async with MemoryClient(settings) as memory:
    fact = await memory.long_term.add_fact(
        subject="John",
        predicate="works_at",
        obj="Acme Corp",
        valid_from=datetime(2023, 1, 1)
    )

Searching long-term memory

async with MemoryClient(settings) as memory:
    entities = await memory.long_term.search_entities("Smith")
    for entity in entities:
        print(entity.name, entity.type, entity.description)

Combining context for an LLM prompt

async with MemoryClient(settings) as memory:
    preferences = await memory.long_term.search_preferences("restaurant recommendation")
    for pref in preferences:
        print(f"[{pref.category}] {pref.preference}")

    # Get combined context (conversation + long-term) ready to inject into a prompt
    context = await memory.get_context(
        "What restaurant should I recommend?",
        session_id="user-123"
    )
    print(context)

Reasoning Memory: Traces and Tool Calls

Reasoning memory captures the agent’s chain-of-thought for a task — the steps it took, the tools it called, and the final outcome. This creates an auditable record of how the agent arrived at its answers and enables retrieval of similar past reasoning patterns.

async with MemoryClient(settings) as memory:
    trace = await memory.reasoning.start_trace(
        session_id="user-123",
        task="Find a restaurant recommendation",
        triggered_by_message_id="fc1418d1-a7db-4ff6-964e-057ea7734edd",
    )

    step = await memory.reasoning.add_step(
        trace.id,
        thought="I should search for nearby restaurants",
        action="search_restaurants"
    )

    await memory.reasoning.record_tool_call(
        step.id,
        tool_name="search_api",
        arguments={"query": "Italian restaurants"},
        result=["La Trattoria", "Pasta Palace"],
        status=ToolCallStatus.SUCCESS,
        duration_ms=150,
        message_id="fc1418d1-a7db-4ff6-964e-057ea7734edd",
    )

    await memory.reasoning.complete_trace(
        trace.id,
        outcome="Recommended La Trattoria",
        success=True
    )

    similar = await memory.reasoning.get_similar_traces("recommending restaurants")
    for t in similar:
        print(t.task, t.created_at, t.completed_at)

Entity Extraction Pipeline

The library ships with a configurable extraction pipeline that can identify named entities in raw text using spaCy, GLiNER, and an LLM fallback — merging results by confidence:

from neo4j_agent_memory.extraction import create_extractor
from neo4j_agent_memory.config import ExtractionConfig

config = ExtractionConfig(
    extractor_type="pipeline",
    enable_spacy=True,
    enable_gliner=True,
    enable_llm_fallback=True,
    merge_strategy="confidence",
)

extractor = create_extractor(config)
result = await extractor.extract("John Smith works at Acme Corp in New York.")

GLiNER is a zero-shot named entity recognition model that can identify arbitrary entity types without fine-tuning. It complements spaCy’s fixed type inventory and the LLM fallback’s higher latency and cost.

Memory Architecture in Neo4j

Each memory type maps to a distinct graph pattern:

Memory type	Neo4j representation
Short-term	`(:Session)-[:HAS_MESSAGE]->(:Message)` nodes with role and timestamp
Long-term entities	`(:Entity)` nodes with type, subtype, description properties
Long-term facts	`(:Fact)` nodes with subject, predicate, object, and temporal scope properties
Long-term preferences	`(:Preference)` nodes with category, preference, and context
Reasoning traces	`(:Trace)-[:HAS_STEP]->(:Step)-[:CALLED_TOOL]->(:ToolCall)`

Because all memory lives in Neo4j as a property graph, you can query across memory types, link reasoning traces to the messages that triggered them, and build analytics on agent behaviour over time.

Use memory.get_context(query, session_id=...) as the single entry point when building the prompt context for each new agent turn. It combines recent conversation history, relevant long-term preferences, and similar past reasoning traces into a single string.

Ontology-Guided KG Construction (S2)

Agents & Advanced Patterns (S2)

Season 3: LLMs, Agents & Quality

Neo4j as Persistent Agent Memory for LLM Applications

Watch the Recording

Session Code

The `neo4j-agent-memory` Library

Setup

Short-Term Memory: Conversation History

Storing messages

Retrieving and searching

Generating conversation summaries

Long-Term Memory: Entities, Facts, and Preferences

Adding entities and preferences

Adding temporally scoped facts

Searching long-term memory

Combining context for an LLM prompt

Reasoning Memory: Traces and Tool Calls

Entity Extraction Pipeline

Memory Architecture in Neo4j

Build docs developers (and LLMs) love

Ontology-Guided KG Construction (S2)

Agents & Advanced Patterns (S2)

Season 3: LLMs, Agents & Quality

Documentation Index

Watch the Recording

Session Code

​The neo4j-agent-memory Library

​Setup

​Short-Term Memory: Conversation History

​Storing messages

​Retrieving and searching

​Generating conversation summaries

​Long-Term Memory: Entities, Facts, and Preferences

​Adding entities and preferences

​Adding temporally scoped facts

​Searching long-term memory

​Combining context for an LLM prompt

​Reasoning Memory: Traces and Tool Calls

​Entity Extraction Pipeline

​Memory Architecture in Neo4j

Build docs developers (and LLMs) love

The `neo4j-agent-memory` Library

Setup

Short-Term Memory: Conversation History

Storing messages

Retrieving and searching

Generating conversation summaries

Long-Term Memory: Entities, Facts, and Preferences

Adding entities and preferences

Adding temporally scoped facts

Searching long-term memory

Combining context for an LLM prompt

Reasoning Memory: Traces and Tool Calls

Entity Extraction Pipeline

Memory Architecture in Neo4j