Three memory types

AI agents that interact with knowledge graphs need more than a vector store. They need three distinct kinds of memory — short-term, long-term, and reasoning — each serving a different purpose and stored differently. This page explains what each type is, why it matters, and how create-context-graph implements all three.

Overview

Short-term memory

Conversation history for the current session, scoped by session_id. Ephemeral by design.

Long-term memory

The persistent knowledge graph — POLE+O entities, typed relationships, and structured properties that accumulate over time.

Reasoning memory

Decision traces that record the full causal chain from question to answer. The audit trail for every agent decision.

Short-term memory: what just happened

Short-term memory holds the immediate conversational context — messages exchanged in the current session, documents the user has shared, and any ephemeral state that matters right now but not next week. In a context graph application, short-term memory includes:

Conversation history — the back-and-forth between user and agent within a session, identified by a session_id
Document content — ingested documents (reports, notes, records) stored as messages the agent can reference during the conversation
Session metadata — timestamps, user identity, and other per-session context

Short-term memory is scoped to a session. When the session ends, this memory can be discarded or archived — it does not automatically become part of the persistent knowledge graph. Implementation: The neo4j-agent-memory library’s short_term module stores messages with add_message(), associating them with a session ID, role (user or assistant), content, and metadata. In the generated app, each chat session gets its own short-term memory space.

# Storing a message in short-term memory (from context_graph_client.py.j2)
await client.short_term.add_message(
    session_id=session_id,
    role="user",
    content=message,
)

Long-term memory: what the system knows

Long-term memory is the persistent knowledge graph — the entities, relationships, and facts that represent the domain’s accumulated knowledge. This is the “context graph” that gives the project its name. Long-term memory stores:

Entities — People, organizations, locations, events, and domain-specific objects classified using the POLE+O model
Relationships — Typed, directed connections between entities (WORKS_FOR, OWNS_ACCOUNT, DIAGNOSED_WITH, OBSERVED_AT)
Properties — Structured attributes on entities and relationships (names, dates, amounts, coordinates, statuses)

Unlike a flat database or document store, the graph structure captures how things relate. An agent can traverse relationships to answer questions that span multiple hops:

“Which clients of Organization X have accounts with transactions above $100K in the last quarter?”

That question requires traversing Person → Organization, Person → Account, and Account → Transaction — three hops in a single Cypher query. Implementation: The neo4j-agent-memory library’s long_term module provides add_entity() for creating typed entities with attributes. The POLE+O type system (Person, Organization, Location, Event, Object) gives every entity a category the library uses to organize and query the graph. In generated projects, entity data comes from fixtures, SaaS connectors, or LLM-generated synthetic data.

# Creating a long-term entity (from ingest.py)
await client.long_term.add_entity(
    name=entity["name"],
    entity_type=entity["label"],
    pole_type=entity["pole_type"],
    properties=entity.get("properties", {}),
)

Reasoning memory: how decisions were made

Reasoning memory records the agent’s decision-making process: what it was asked, what it thought, what tools it called, what it observed, and what conclusion it reached. This is the least common memory type in AI systems — and the most important for enterprise applications. Reasoning memory stores:

Decision traces — a complete sequence of steps recording a reasoning chain
Thought steps — the agent’s internal reasoning at each stage (“I need to check the client’s portfolio allocation”)
Action steps — the tools called and queries executed (query_portfolio(client_name='Sarah Chen'))
Observations — the results returned by each action (“Portfolio: 60% tech, 25% healthcare, 15% bonds”)
Outcomes — the final answer or recommendation, linked to the evidence that produced it
Provenance — causal links connecting conclusions to the specific data points and reasoning steps that led to them

Each trace is stored as a chain of nodes in Neo4j:

DecisionTrace → HAS_STEP → TraceStep[1] → TraceStep[2] → ...

Implementation: The neo4j-agent-memory library’s reasoning module provides start_trace(), add_step(), and complete_trace(). Each trace is linked to a session and preserves the full causal path from question to answer.

# Recording a reasoning trace (from ingest.py)
trace = await client.reasoning.start_trace(
    session_id=session_id,
    task=trace_def["task"],
)
for step in trace_def["steps"]:
    await client.reasoning.add_step(
        trace_id=trace.id,
        thought=step["thought"],
        action=step["action"],
        observation=step.get("observation", ""),
    )
await client.reasoning.complete_trace(
    trace_id=trace.id,
    outcome=trace_def["outcome"],
)

How this differs from RAG

Retrieval-Augmented Generation (RAG) systems typically use a single memory mechanism: a vector store. Documents are chunked, embedded, and stored. At query time, the most similar chunks are retrieved and stuffed into the LLM’s context window. This works for simple question-answering over documents. It has significant limitations when agents need to reason, aggregate, or explain:

Capability	Vector-only RAG	Context graph (three memory types)
Recall a specific fact	Good (if in a retrieved chunk)	Good (entity properties are directly queryable)
Traverse relationships	Poor (must be in the same chunk)	Native (graph traversal across any number of hops)
Maintain conversation state	Requires external session management	Built-in short-term memory per session
Explain how an answer was reached	Not stored	Full reasoning trace with provenance
Audit past decisions	Not possible	Query historical traces by task, outcome, or date
Aggregate across entities	Poor (scattered across chunks)	Native (Cypher aggregation queries)
Detect patterns and anomalies	Requires post-processing	Graph algorithms (GDS) on the knowledge graph

Generated projects include vector_client.py for semantic similarity search. The context graph approach adds structured knowledge and reasoning traces on top of vector retrieval — it does not replace it.

Why traceability matters for enterprise AI

In regulated industries — healthcare, financial services, legal, government — AI systems must be auditable. When an agent recommends a treatment plan, approves a transaction, or flags a compliance issue, stakeholders must be able to answer:

What data did the agent use? Reasoning traces link conclusions to specific entities and relationships in the knowledge graph.
What was the agent’s logic? Thought steps record the reasoning chain, not just the final answer.
Can we reproduce the decision? The trace captures the exact sequence of tool calls and observations.
What changed since the last decision? Because both the knowledge graph (long-term memory) and reasoning traces are stored in Neo4j, you can query how the graph evolved between two decisions.

Without reasoning memory, an agent is a black box that produces answers. With it, the agent produces answers and an auditable record of how it reached them.

The short-term and reasoning memory features require the neo4j-agent-memory package. If it is not installed, the ingestion pipeline falls back to direct Neo4j driver calls for entity and relationship creation only.

How the ingestion pipeline uses all three

When you run create-context-graph with --demo-data --ingest, the ingestion pipeline (ingest.py) demonstrates all three memory types in sequence:

Schema

Applies Cypher constraints and indexes from the ontology — uniqueness constraints on entity IDs, name indexes for fast lookups, and infrastructure indexes for Document and DecisionTrace nodes.

Long-term memory

Ingests entities (Person, Organization, and all domain-specific types) via client.long_term.add_entity(), building the knowledge graph with typed nodes and directed relationships.

Short-term memory

Ingests documents via client.short_term.add_message(), storing them as session messages with metadata and :MENTIONS links back to the entities they reference.

Reasoning memory

Ingests decision traces via client.reasoning.start_trace(), add_step(), and complete_trace(), recording multi-step reasoning scenarios as traversable chains of TraceStep nodes.

How the three types work together at runtime

In the generated application, all three memory types participate in every chat session:

Retrieve short-term memory

The backend calls get_conversation_history(session_id) to retrieve prior messages from neo4j-agent-memory’s MemoryClient before passing them to the LLM.

Store the new message

The incoming user message is stored via store_message(session_id, "user", message) before the agent begins reasoning.

Query long-term memory

The agent calls domain-specific tools that execute Cypher queries against the knowledge graph. The CypherResultCollector captures tool metadata (name, inputs, output preview) for visualization.

Store the response

The agent’s response is stored via store_message(session_id, "assistant", response).

Return enriched output

The response is returned with graph_data for the NVL visualization, tool_calls for inline tool call cards, and the session_id for conversation continuity.

This architecture means every interaction enriches the system: the knowledge graph grows, the conversation history accumulates, and the reasoning traces provide an ever-expanding audit trail.

The context_graph_client.py in every generated project initializes the MemoryClient at startup with a graceful fallback if neo4j-agent-memory is not installed. All 8 supported agent frameworks call get_conversation_history() and store_message() — the memory integration is centralized in the shared client, not duplicated across framework templates.

Get Started

Guides

Concepts

Overview

Short-term memory

Long-term memory

Reasoning memory

Short-term memory: what just happened

Long-term memory: what the system knows

Reasoning memory: how decisions were made

How this differs from RAG

Why traceability matters for enterprise AI

How the ingestion pipeline uses all three

How the three types work together at runtime

Next steps

Why context graphs?

Domain ontologies

Build docs developers (and LLMs) love

Get Started

Guides

Concepts

Documentation Index

​Overview

Short-term memory

Long-term memory

Reasoning memory

​Short-term memory: what just happened

​Long-term memory: what the system knows

​Reasoning memory: how decisions were made

​How this differs from RAG

​Why traceability matters for enterprise AI

​How the ingestion pipeline uses all three

​How the three types work together at runtime

​Next steps

Why context graphs?

Domain ontologies

Build docs developers (and LLMs) love

Overview

Short-term memory: what just happened

Long-term memory: what the system knows

Reasoning memory: how decisions were made

How this differs from RAG

Why traceability matters for enterprise AI

How the ingestion pipeline uses all three

How the three types work together at runtime

Next steps