Persistent hybrid memory for AI agents with Mem0

Most AI agents suffer from session amnesia: every conversation starts from a blank slate. Mem0 solves this with a self-improving memory layer that automatically extracts key facts from conversations, stores them in vector and graph databases, resolves contradictions, and retrieves the right context in future sessions. Instead of reinventing deduplication, conflict resolution, and semantic search, you get a battle-tested system that learns from each interaction.

Vector memory

Stores extracted facts as embeddings in Qdrant (or another supported vector store) for semantic similarity retrieval.

Graph memory

Maps entity relationships in Neo4j (or another graph database) so the agent can answer questions like “how did Paper A influence Paper B?”

Automatic extraction

Mem0 uses an LLM to pull key facts out of raw conversation text, so you don’t need to define extraction rules manually.

Conflict resolution

When a user contradicts an earlier statement, Mem0 updates the stored fact rather than creating a duplicate.

What you’ll build

A Personal AI Research Assistant that:

Maintains intelligent memory that automatically extracts and stores research interests
Maps knowledge relationships between papers, authors, and concepts using graph storage
Adapts to user preferences through self-improving memory
Provides contextual assistance using hybrid memory retrieval
Learns and evolves with each research conversation

Prerequisites

OpenAI API key

Used for LLM reasoning (GPT-4o-mini) and text embeddings.

Qdrant Cloud

Vector store for semantic memory search. A free tier is available at cloud.qdrant.io.

Neo4j AuraDB

Graph database for Phase 2 relationship mapping. A free tier is available at neo4j.com/aura.

Install dependencies

pip install mem0ai openai python-dotenv neo4j

import os
import getpass
from mem0 import Memory
from openai import OpenAI

Configure API keys

def setup_api_keys():
    if "OPENAI_API_KEY" not in os.environ:
        os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

    if "QDRANT_URL" not in os.environ:
        os.environ["QDRANT_URL"] = input("Enter your Qdrant Cloud URL: ")
    if "QDRANT_API_KEY" not in os.environ:
        os.environ["QDRANT_API_KEY"] = getpass.getpass("Enter your Qdrant API key: ")

    return True

use_qdrant_cloud = setup_api_keys()

Phase 1: vector memory

Configure the memory system

config = {
    "llm": {
        "provider": "openai",
        "config": {
            "model": "gpt-4o-mini",
            "temperature": 0.1,
            "max_tokens": 2000,
        }
    },
    "embedder": {
        "provider": "openai",
        "config": {
            "model": "text-embedding-3-large",
            "embedding_dims": 3072,
        }
    },
    "vector_store": {
        "provider": "qdrant",
        "config": {
            "url": os.environ["QDRANT_URL"],
            "api_key": os.environ["QDRANT_API_KEY"],
            "collection_name": "research_assistant_vectors",
            "embedding_model_dims": 3072,  # Must match the embedder above
        }
    },
    "version": "v1.1",
}

try:
    memory = Memory.from_config(config)
    print("Memory system initialised successfully!")
except Exception as e:
    print(f"Error initialising memory: {e}")

embedding_model_dims in the vector store config must match the actual output dimensions of the embedder you configure. text-embedding-3-large produces 3072-dimensional vectors.

Core Mem0 operations

Mem0 exposes three primary operations you’ll use in most agents:

memory.add()

Extracts key facts from text or a conversation and stores them, handling deduplication and conflict resolution automatically.

memory.search()

Finds semantically similar stored memories for a given query string, ranked by relevance.

memory.get_all()

Returns all memories for a specific user_id, useful for inspecting or exporting what the system has learned.

Build the research assistant

class PersonalResearchAssistant:

    def __init__(self, memory_instance):
        self.client = OpenAI()
        self.memory = memory_instance
        print("Research Assistant initialised with Mem0 memory!")

    def ask(self, question, user_id):
        # Retrieve relevant memories before answering
        previous_memories = self.search_memories(question, user_id=user_id)

        system_message = (
            "You are a personal AI Research Assistant. Help users with research "
            "questions, remember their interests, and provide contextual recommendations."
        )

        if previous_memories:
            memory_context = ", ".join(previous_memories)
            prompt = f"{system_message}\n\nUser input: {question}\nPrevious memories: {memory_context}"
        else:
            prompt = f"{system_message}\n\nUser input: {question}"

        try:
            response = self.client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[{"role": "system", "content": prompt}],
                temperature=0.1,
                max_tokens=2000,
            )
            answer = response.choices[0].message.content

            # Store the question so context accumulates over time
            self.memory.add(question, user_id=user_id, metadata={"category": "research"})
            return answer

        except Exception as e:
            return f"Encountered an error: {e}"

    def get_memories(self, user_id):
        try:
            memories = self.memory.get_all(user_id=user_id)
            if isinstance(memories, dict) and "results" in memories:
                return [m["memory"] for m in memories["results"]]
            elif isinstance(memories, list):
                return [m["memory"] for m in memories]
            return []
        except Exception as e:
            print(f"Error retrieving memories: {e}")
            return []

    def search_memories(self, query, user_id):
        try:
            memories = self.memory.search(query, user_id=user_id)
            if isinstance(memories, dict) and "results" in memories:
                return [m["memory"] for m in memories["results"]]
            elif isinstance(memories, list):
                return [m["memory"] for m in memories]
            return []
        except Exception as e:
            print(f"Error searching memories: {e}")
            return []

assistant = PersonalResearchAssistant(memory)

Watch memory evolve across conversations

Run these interactions in sequence to observe how the memory layer builds context over time.

First interaction — knowledge extraction

response1 = assistant.ask(
    "I'm interested in transformer architectures for natural language processing. "
    "Can you help me find recent papers on this topic?",
    user_id="researcher",
)
print(f"Assistant: {response1}")

Mem0 extracts the research interest (“transformer architectures for NLP”) and stores it.

Build context — preference capture

response2 = assistant.ask(
    "I prefer papers that include practical implementation details and code examples. "
    "Theoretical papers without code are less useful for my work.",
    user_id="researcher",
)
print(f"Assistant: {response2}")

Mem0 stores the style preference and links it to the earlier topic.

Semantic search — beyond keywords

response3 = assistant.ask(
    "What about BERT and GPT models? Are they related to my research interests?",
    user_id="researcher",
)
print(f"Assistant: {response3}")

The assistant retrieves the transformer-interest memory via semantic similarity, even though “BERT” and “GPT” were not mentioned in the first message.

Conflict resolution — preference update

response4 = assistant.ask(
    "Actually, I also need to understand the theoretical foundations of attention "
    "mechanisms. Can you recommend some foundational theory papers?",
    user_id="researcher",
)
print(f"Assistant: {response4}")

Mem0 updates the stored preference to reflect the nuanced position rather than creating a contradiction.

Inspect what was learned

def analyze_extracted_memories():
    all_memories = assistant.get_memories(user_id="researcher")

    if all_memories:
        print(f"Total memories extracted: {len(all_memories)}")
        for i, memory in enumerate(all_memories, 1):
            print(f"\n{i}. {memory}")

        # Test semantic search capability
        test_queries = [
            "neural networks",        # Should connect to transformers
            "code implementations",   # Should find practical preferences
            "attention mechanisms",   # Should connect to transformer interest
            "deep learning papers",   # Should find research interests
        ]
        for query in test_queries:
            related = assistant.search_memories(query, user_id="researcher")
            print(f"\nQuery: '{query}' — {len(related)} related memories found")
            if related:
                print(f"  Top match: {related[0][:100]}...")

memories = analyze_extracted_memories()

Phase 2: graph memory

Graph memory adds explicit entity-relationship mapping on top of the vector layer. Use it when your application needs to answer structural questions like “who collaborated with whom?” or “how did Paper A influence Paper B?”

Configure graph storage

def setup_graph_capabilities():
    neo4j_uri = input("Neo4j URI (e.g. neo4j+s://xxx.databases.neo4j.io): ")
    neo4j_username = input("Username (usually 'neo4j'): ")
    neo4j_password = getpass.getpass("Password: ")

    return True, {
        "url": neo4j_uri,
        "username": neo4j_username,
        "password": neo4j_password,
    }

has_graph, graph_config = setup_graph_capabilities()

Create the hybrid configuration

enhanced_config = {
    "llm": {
        "provider": "openai",
        "config": {
            "model": "gpt-4o-mini",
            "temperature": 0.1,
            "max_tokens": 2000,
        }
    },
    "embedder": {
        "provider": "openai",
        "config": {
            "model": "text-embedding-3-large",
            "embedding_dims": 3072,
        }
    },
    "vector_store": {
        "provider": "qdrant",
        "config": {
            "url": os.environ["QDRANT_URL"],
            "api_key": os.environ["QDRANT_API_KEY"],
            "collection_name": "research_assistant_hybrid",
            "embedding_model_dims": 3072,
        }
    },
    "graph_store": {
        "provider": "neo4j",
        "config": graph_config,
    },
    "version": "v1.1",
}

try:
    enhanced_memory = Memory.from_config(enhanced_config)
    enhanced_assistant = PersonalResearchAssistant(enhanced_memory)
    print("Hybrid memory system initialised successfully!")
except Exception as e:
    print(f"Error initialising hybrid memory: {e}")
    enhanced_assistant = assistant  # Fall back to vector-only

Graph-enhanced research examples

Research lineage
Collaboration network

graph_response1 = enhanced_assistant.ask(
    "I'm studying the lineage of transformer papers. The original "
    "'Attention Is All You Need' by Vaswani et al. led to BERT by Devlin et al., "
    "and then to many other models. Can you help me map these research connections "
    "and suggest related work?",
    user_id="graph_user",
)
print(f"Hybrid Assistant: {graph_response1}")

The graph layer maps explicit led_to relationships between papers so future queries can traverse the research lineage.

graph_response2 = enhanced_assistant.ask(
    "What other researchers have worked on transformer architectures? "
    "I want to understand the collaboration network and research groups in this field.",
    user_id="graph_user",
)
print(f"Hybrid Assistant: {graph_response2}")

Neo4j stores collaborated_with edges between author nodes, enabling graph traversal queries that vector search cannot answer.

Deployment options

Self-hosted

Full control over your stack. Bring your own Qdrant and Neo4j instances. Ideal for learning, prototyping, and custom compliance requirements.

Mem0 managed platform

Managed service with a free tier (10K memories). Zero infrastructure overhead, built-in analytics, and enterprise graph memory. Ships faster and scales without ops.

Mem0’s internal benchmarks report 26% better accuracy, 91% faster responses, and 90% fewer tokens compared to naive context-stuffing approaches.

Next steps

Combine memory.add() calls with structured extraction prompts to improve the quality of stored facts.
Use memory.update() and memory.delete() to maintain hygiene in long-running production deployments.
Explore the Mem0 platform dashboard to visualise the knowledge graph your agent is building.

Get Started

Agent Frameworks

Memory & Knowledge

Tool Integration & Data

Deployment

Observability & Quality

Persistent hybrid memory for AI agents with Mem0

Vector memory

Graph memory

Automatic extraction

Conflict resolution

What you’ll build

Prerequisites

OpenAI API key

Qdrant Cloud

Neo4j AuraDB

Install dependencies

Configure API keys

Phase 1: vector memory

Configure the memory system

Core Mem0 operations

memory.add()

memory.search()

memory.get_all()

Build the research assistant

Watch memory evolve across conversations

Inspect what was learned

Phase 2: graph memory

Configure graph storage

Create the hybrid configuration

Graph-enhanced research examples

Deployment options

Self-hosted

Mem0 managed platform

Next steps

Build docs developers (and LLMs) love

Get Started

Agent Frameworks

Memory & Knowledge

Tool Integration & Data

Deployment

Observability & Quality

Documentation Index

Vector memory

Graph memory

Automatic extraction

Conflict resolution

​What you’ll build

​Prerequisites

OpenAI API key

Qdrant Cloud

Neo4j AuraDB

​Install dependencies

​Configure API keys

​Phase 1: vector memory

​Configure the memory system

​Core Mem0 operations

memory.add()

memory.search()

memory.get_all()

​Build the research assistant

​Watch memory evolve across conversations

​Inspect what was learned

​Phase 2: graph memory

​Configure graph storage

​Create the hybrid configuration

​Graph-enhanced research examples

​Deployment options

Self-hosted

Mem0 managed platform

​Next steps

Build docs developers (and LLMs) love

What you’ll build

Prerequisites

Install dependencies

Configure API keys

Phase 1: vector memory

Configure the memory system

Core Mem0 operations

Build the research assistant

Watch memory evolve across conversations

Inspect what was learned

Phase 2: graph memory

Configure graph storage

Create the hybrid configuration

Graph-enhanced research examples

Deployment options

Next steps