Sentinel SoftServe is an agentic AI co-pilot for DevOps and SRE teams. It connects to your observability stack — Prometheus, Alertmanager, Loki — and automatically triages incidents using a LangGraph multi-agent pipeline. The agent classifies each incident, investigates with runtime-specific tools, proposes a safe corrective action, and waits for engineer approval before executing anything. Once resolved, it generates a post-mortem automatically.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/nicolas344/Sentinel-SoftServe/llms.txt
Use this file to discover all available pages before exploring further.
Quickstart
Run Sentinel locally with Docker Compose in under 10 minutes.
Architecture
Understand the multi-agent pipeline and system components.
Supported Runtimes
Docker, Podman, Kubernetes, and PostgreSQL agents with real tool calls.
API Reference
Full REST API for incidents, actions, alerts, and health checks.
How Sentinel Works
When an alert fires, Sentinel handles the full incident lifecycle — from detection to post-mortem — while keeping engineers in control of every remediation action.Alert fires
Prometheus detects an anomaly and fires an alert. Alertmanager routes it to Sentinel’s
/api/alerts webhook, which creates an incident record in Supabase and fetches logs from Loki.Agent pipeline runs
The LangGraph supervisor classifies the incident type, routes it to the correct specialist agent (Docker, Podman, Kubernetes, or PostgreSQL), and runs an investigation using read-only tool calls and RAG-retrieved runbooks.
Engineer reviews and approves
The agent proposes a safe, whitelisted remediation command. The on-call engineer reviews the full reasoning trace in the dashboard and approves, rejects, or postpones the action.
Supported Runtimes
Docker
Inspect containers, fetch logs, and restart with
docker restart.Podman
Full rootless Podman support via the Docker-compatible SDK.
Kubernetes
Pod status, events, logs, deployments, and
kubectl rollout restarts.PostgreSQL
Query
pg_stat_activity, cancel backends, and terminate connections safely.Key Features
Human-in-the-Loop Approval
Every remediation action is gated by engineer approval. No command runs automatically.
Multi-Layer Guardrails
Deterministic rules block prompt injection, enforce action whitelists, and scope agent responses to DevOps topics.
RAG Runbook Retrieval
ChromaDB stores domain-specific runbooks per runtime. Agents retrieve the most relevant procedures before investigating.
Automated Post-Mortems
LLM-generated post-mortems include timeline, MTTR, root cause, and remediation summary — editable in the dashboard.