Sentinel SoftServe: AI Copilot for DevOps Incident Triage

Sentinel SoftServe is an agentic AI co-pilot for DevOps and SRE teams. It connects to your observability stack — Prometheus, Alertmanager, Loki — and automatically triages incidents using a LangGraph multi-agent pipeline. The agent classifies each incident, investigates with runtime-specific tools, proposes a safe corrective action, and waits for engineer approval before executing anything. Once resolved, it generates a post-mortem automatically.

Quickstart

Run Sentinel locally with Docker Compose in under 10 minutes.

Architecture

Understand the multi-agent pipeline and system components.

Supported Runtimes

Docker, Podman, Kubernetes, and PostgreSQL agents with real tool calls.

API Reference

Full REST API for incidents, actions, alerts, and health checks.

How Sentinel Works

When an alert fires, Sentinel handles the full incident lifecycle — from detection to post-mortem — while keeping engineers in control of every remediation action.

Alert fires

Prometheus detects an anomaly and fires an alert. Alertmanager routes it to Sentinel’s /api/alerts webhook, which creates an incident record in Supabase and fetches logs from Loki.

Agent pipeline runs

The LangGraph supervisor classifies the incident type, routes it to the correct specialist agent (Docker, Podman, Kubernetes, or PostgreSQL), and runs an investigation using read-only tool calls and RAG-retrieved runbooks.

Engineer reviews and approves

The agent proposes a safe, whitelisted remediation command. The on-call engineer reviews the full reasoning trace in the dashboard and approves, rejects, or postpones the action.

Action executes and post-mortem is generated

Upon approval, the backend executes the command, verifies recovery, and — once the incident is resolved — automatically generates a structured post-mortem with MTTR, timeline, and root cause analysis.

Supported Runtimes

Docker

Inspect containers, fetch logs, and restart with docker restart.

Podman

Full rootless Podman support via the Docker-compatible SDK.

Kubernetes

Pod status, events, logs, deployments, and kubectl rollout restarts.

PostgreSQL

Query pg_stat_activity, cancel backends, and terminate connections safely.

Key Features

Human-in-the-Loop Approval

Every remediation action is gated by engineer approval. No command runs automatically.

Multi-Layer Guardrails

Deterministic rules block prompt injection, enforce action whitelists, and scope agent responses to DevOps topics.

RAG Runbook Retrieval

ChromaDB stores domain-specific runbooks per runtime. Agents retrieve the most relevant procedures before investigating.

Automated Post-Mortems

LLM-generated post-mortems include timeline, MTTR, root cause, and remediation summary — editable in the dashboard.

Get Started

Deployment

Core Concepts

Supported Runtimes

Using the Dashboard

Sentinel SoftServe: AI Copilot for DevOps Incident Triage

Quickstart

Architecture

Supported Runtimes

API Reference

How Sentinel Works

Supported Runtimes

Docker

Podman

Kubernetes

PostgreSQL

Key Features

Human-in-the-Loop Approval

Multi-Layer Guardrails

RAG Runbook Retrieval

Automated Post-Mortems

Build docs developers (and LLMs) love

Get Started

Deployment

Core Concepts

Supported Runtimes

Using the Dashboard

Documentation Index

Quickstart

Architecture

Supported Runtimes

API Reference

​How Sentinel Works

​Supported Runtimes

Docker

Podman

Kubernetes

PostgreSQL

​Key Features

Human-in-the-Loop Approval

Multi-Layer Guardrails

RAG Runbook Retrieval

Automated Post-Mortems

Build docs developers (and LLMs) love

How Sentinel Works

Supported Runtimes

Key Features