Sentinel SoftServe is an agentic AI co-pilot designed for DevOps and SRE engineers who need to move fast during production incidents. It watches your infrastructure continuously, automatically detects crashes, resource exhaustion, and service degradations, then orchestrates a full triage pipeline — from log collection and root-cause analysis to proposing a safe corrective action — all without requiring you to dig through dashboards manually. Every AI decision passes through a human-in-the-loop approval gate before any remediation command is executed on your infrastructure. The project is an academic industry collaboration between Universidad EAFIT and SoftServe, deployed live at sentinel-softserve-1.onrender.com.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/nicolas344/Sentinel-SoftServe/llms.txt
Use this file to discover all available pages before exploring further.
The problem Sentinel solves
Modern containerised workloads generate thousands of metrics and log lines per minute. When something goes wrong at 2 a.m., engineers waste critical minutes correlating Prometheus alerts with Loki logs, reading runbooks, and deciding whether to restart a container or roll back a deployment. Sentinel eliminates that toil by:- Automatically ingesting alerts from Alertmanager the moment Prometheus fires a rule.
- Fetching and analysing logs from Loki against a ChromaDB runbook knowledge base.
- Classifying the incident type and routing it to the right specialist agent (Docker, Podman, Kubernetes, or PostgreSQL).
- Proposing one safe, whitelisted command for human approval — never executing anything autonomously.
- Generating a post-mortem and writing the incident into episodic memory so future triage gets smarter.
Tech stack
Sentinel is built from purpose-chosen components across every layer of the stack.| Layer | Technology |
|---|---|
| Frontend | React 19 + Vite 7 + Tailwind CSS v4 + shadcn/ui |
| Backend | FastAPI + Uvicorn |
| AI Orchestration | LangGraph + LangChain |
| LLM | OpenAI gpt-4o-mini |
| Knowledge Base | ChromaDB (runbooks RAG + episodic memory) |
| Auth & DB | Supabase (email/password, JWT, Realtime) |
| Agent Observability | LangFuse v2 (self-hosted) |
| Incident Detection | cAdvisor + Prometheus + Alertmanager |
| Logs | Loki + Promtail |
| Dashboards | Grafana |
Supported runtimes
Sentinel ships a dedicated specialist agent for each supported runtime. Each agent carries its own tool palette and ChromaDB runbook collection so investigations stay tightly scoped.| Runtime | Agent | Tools |
|---|---|---|
| Docker | DockerAgent | docker_inspect, docker_logs, docker_stats, docker_ps |
| Podman | PodmanAgent | podman_inspect, podman_logs, podman_stats, podman_ps |
| Kubernetes | KubernetesAgent | get_pod_status, describe_pod, get_pod_logs, get_pod_events, get_deployment_status, list_failing_pods |
| PostgreSQL | PostgresAgent | pg_stat_activity, pg_stat_database, pg_stat_replication, pg_locks |
Where to go next
Quickstart
Run Sentinel locally in under 10 minutes with Docker Compose.
Architecture
Understand the LangGraph agent pipeline and observability stack.
Supported Runtimes
Deep-dive into each specialist agent and its tool set.
API Reference
Explore the FastAPI endpoints that power the dashboard and webhooks.
Prerequisites before you begin:
- Docker Desktop installed and running
- Node.js 20+
- Python 3.9+
- A Supabase project with a URL, service-role key, anon key, and JWT secret
- An OpenAI API key (gpt-4o-mini access required)