ChromaDB Memory and RAG Runbook Retrieval in Sentinel
How Sentinel uses ChromaDB for two memory layers: RAG-based runbook retrieval for investigation, and episodic memory for finding similar past incidents.
Use this file to discover all available pages before exploring further.
Sentinel uses ChromaDB as a dual-purpose memory system. The first layer holds curated runbooks — human-written procedures for handling known incident types — and is queried during investigation to ground the agent’s reasoning in operational best practices. The second layer is episodic memory: an automatically growing collection of resolved incidents that lets the agent ask “has something like this happened before, and what worked?”. Both layers are domain-separated into dedicated collections, keeping Docker context out of PostgreSQL queries and vice versa.
# services/agents/memory/runbooks.pydef query_runbooks(collection_name: str, query: str, k: int = 3) -> list[str]: """ Returns the k most relevant runbook documents for a query. Returns only text — LLMs do not need runbook metadata. """ collection = get_or_create_collection(collection_name) results = collection.query(query_texts=[query], n_results=k) return results.get("documents", [[]])[0] or []
Agents call this via the DomainAgent base class helper:
def recall_runbooks(self, query: str, k: int = 3) -> list[str]: from services.agents.memory.runbooks import query_runbooks return query_runbooks(self.runbooks_collection, query, k=k)
The default k=5 at the agent level retrieves the five most relevant runbook chunks, which are included in the agent’s LLM context alongside the incident details and tool call results.
Runbook documents are loaded from Markdown files using seed_*.py scripts (one per domain). Each script reads the Markdown files, splits them into chunks, and upserts them into the appropriate ChromaDB collection. Seeding is idempotent — running the seed script again updates existing documents rather than creating duplicates.
Re-run the seed scripts whenever you add or update runbook Markdown files. ChromaDB uses embeddings generated at seed time, so new content is not automatically indexed. After adding a new runbook for, say, Kubernetes PodDisruptionBudget violations, run python scripts/seed_kubernetes.py to make it retrievable.
After each investigation, the supervisor calls agent.remember_incident(ctx, result), which delegates to:
# services/agents/memory/incidents.pydef store_incident( collection_name: str, ctx: "IncidentContext", result: "InvestigationResult",) -> None: """ Saves an investigated incident to the domain's episodic memory. Idempotent: upserts by incident_id, so re-running does not duplicate. """ collection = get_or_create_collection(collection_name) tools_used = [tc.name for tc in result.tool_calls] metadata = { "incident_id": ctx.incident_id, "title": ctx.title, "target": ctx.target, "severity": ctx.severity, "incident_type": ctx.incident_type or "unknown", "tools_used": ",".join(tools_used), # Chroma does not accept lists in metadata "stored_at": datetime.now(tz=timezone.utc).isoformat(), } doc = _build_document(ctx, result) # title + type + target + severity + analysis[:1500] collection.upsert(ids=[ctx.incident_id], documents=[doc], metadatas=[metadata])
The document embeds the title, type, target, severity, and a 1,500-character excerpt of the agent’s analysis. This combination gives ChromaDB strong semantic signals to match against future incidents.
During a new investigation, agents query the episodic collection before running tools:
# services/agents/memory/incidents.pydef query_incidents(collection_name: str, query: str, k: int = 3) -> list[dict]: """ Returns the k most similar past incidents. Filters out results with cosine distance > 1.5 (too dissimilar to be useful). Each result: { id, document, metadata, distance } """ collection = get_or_create_collection(collection_name) results = collection.query( query_texts=[query], n_results=k, include=["documents", "metadatas", "distances"], ) out = [] ids = results.get("ids", [[]])[0] docs = results.get("documents", [[]])[0] metas = results.get("metadatas", [[]])[0] dists = results.get("distances", [[]])[0] # Distance filter: only return semantically close matches for i, d, m, dist in zip(ids, docs, metas, dists): if dist is not None and dist > 1.5: continue out.append({"id": i, "document": d, "metadata": m or {}, "distance": dist}) return out
Agents call this via the base class helper, with k=6 at the supervisor level:
def recall_similar_incidents(self, query: str, k: int = 3) -> list[dict]: from services.agents.memory.incidents import query_incidents return query_incidents(self.incidents_collection, query, k=k)
The returned dicts are passed to the agent’s LLM context as similar_past_incidents and are also surfaced in the InvestigationResult so the supervisor can log how many similar incidents were found.
Both memory layers are keyed to the same domain name as the agent. The domain is determined by the supervisor before routing, using ctx.labels:
# Routing logic (simplified from supervisor + DomainAgent.matches()):if ctx.labels.get("source_type") == "database": domain = "postgres"elif ctx.labels.get("container_runtime") == "podman": domain = "podman"elif ctx.labels.get("container_runtime") == "kubernetes": domain = "kubernetes"else: domain = "docker" # default for container_runtime=docker or unset# Resulting collection names:runbooks_collection = f"runbooks-{domain}" # e.g. "runbooks-kubernetes"incidents_collection = f"incidents-{domain}" # e.g. "incidents-kubernetes"
This separation means a PostgreSQL incident never retrieves Docker runbooks, and a Kubernetes CrashLoopBackOff incident does not surface Podman episodic memories. Each agent only sees the context relevant to its domain.
All collection access goes through a single shared client defined in chroma_client.py:
# services/agents/memory/chroma_client.pydef get_client(): """HTTP connection to ChromaDB (process-level singleton).""" global _client if _client is not None: return _client import chromadb chroma_url = os.getenv("CHROMA_HOST", "http://localhost:8001") parsed = urlparse(chroma_url) _client = chromadb.HttpClient(host=parsed.hostname, port=parsed.port or 8001) return _clientdef get_or_create_collection(name: str): """Returns the collection, creating it if it does not exist.""" return get_client().get_or_create_collection(name)
Key characteristics:
Singleton — one HttpClient per process, reused across all memory calls
HTTP mode — connects to a running ChromaDB server (configured via CHROMA_HOST env var)
Lazy import — chromadb is not imported at module load time, avoiding Pydantic v1 conflicts with LangGraph
Get-or-create — collections are created on first access; no manual provisioning required beyond seeding
If ChromaDB is slow or unreachable, the memory write is abandoned after 8 seconds. The incident triage result is already persisted in Supabase at this point — the memory write failure is logged as a warning but does not affect the incident status or the engineer’s view.
The split between runbooks and episodic memory is intentional:
Runbook RAG
Episodic Memory
Content
Canonical procedures (how to handle OOM, deadlock, etc.)
Specific past incidents (what actually happened on api-gateway on 2024-03-15)
Author
Human SREs
Sentinel (auto-generated)
Growth
Manual (seed scripts)
Automatic (every investigation)
Query goal
”What’s the procedure for this?"
"Has this exact pattern occurred before?”
Retrieval value
Grounds reasoning in best practices
Surfaces proven solutions for recurring issues
Runbooks answer procedural questions. Episodic memory answers pattern-recognition questions. Providing both to the investigating agent produces richer, more contextual analyses than either layer alone.