Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/nicolas344/Sentinel-SoftServe/llms.txt

Use this file to discover all available pages before exploring further.

The DockerAgent is Sentinel’s default runtime specialist for container incidents. It uses a bounded ReAct loop — up to four tool invocations — to gather live evidence from the Docker daemon, cross-reference the runbooks-docker ChromaDB collection, and recall similar past incidents before producing a structured markdown analysis. When the investigation concludes, the Supervisor proposes a safe, whitelisted action for human approval.

Runtime Detection

The DockerAgent claims an alert when any of the following conditions is true (evaluated in order):
  1. The alert label container_runtime=docker is explicitly set.
  2. No other runtime label (podman, kubernetes, containerd) is present and source_type is not database and a non-empty target exists.
In other words, Docker is the fallback runtime for all container alerts that do not match a more specific agent. Targets that start with postgres/ or mysql/ are always excluded.
# services/agents/docker/agent.py — DockerAgent.matches()
def matches(self, ctx: IncidentContext) -> bool:
    source = (ctx.labels.get("source_type") or "").lower()
    if source == "database":
        return False

    runtime = (ctx.labels.get("container_runtime") or "").lower()
    if runtime == "docker":
        return True
    if runtime in {"podman", "kubernetes", "containerd"}:
        return False

    target = (ctx.target or "").lower()
    if target.startswith("postgres/") or target.startswith("mysql/"):
        return False

    return bool(ctx.target)

Prerequisites

The Docker socket must be mounted into the backend container at /var/run/docker.sock. The default docker-compose.yml already includes this bind mount:
volumes:
  - /var/run/docker.sock:/var/run/docker.sock:rw
If the socket is not accessible, all four tools return a descriptive error message rather than raising an exception — the agent continues reasoning with the Loki logs already present in its context window.
The backend connects via the DOCKER_HOST environment variable (default: unix:///var/run/docker.sock). You can override this to point at a remote Docker daemon over TCP if needed.

Tools

All four tools are read-only. They never modify container state. The DockerAgent calls them only when the runbooks and Loki logs already in context are insufficient to produce a confident diagnosis.

docker_inspect

Returns a JSON summary of a container’s current state: status, exit_code, restart_count, memory/CPU limits, oom_killed flag, health check result, restart_policy, and timestamps.

docker_logs

Fetches the last N log lines directly from the Docker daemon — more recent than what Loki may have indexed. Hard-capped at 200 lines.

docker_stats

Point-in-time resource snapshot: CPU percentage, memory usage vs. limit, memory percentage, and current PID count.

docker_ps

Lists all containers (running and stopped) with their name, short ID, status, and image. Useful for spotting related containers or recent crashes.

Tool Parameters

ToolParameterTypeDefaultDescription
docker_inspectcontainerstringContainer name or ID prefix
docker_logscontainerstringContainer name or ID prefix
docker_logstailinteger50Number of lines to return (max 200)
docker_statscontainerstringContainer name or ID prefix
docker_ps(none)Lists all containers
If the Docker socket is unavailable (e.g. the backend is running outside of Docker or without the bind mount), every tool returns a graceful fallback message such as:
Tool 'docker_inspect' not available: the backend cannot access the Docker
daemon (socket not mounted). Reason with the Loki logs you already have.
The agent then produces its analysis using only the Loki logs and runbook content already in its context.

Action Proposals

After investigation, the Supervisor’s _build_proposed_action function selects a safe remediation command based on the classified incident_type. All proposals require explicit human approval in the dashboard before execution.
A docker restart <container> command is proposed for incident types that indicate the container process has stopped or is cycling:
Incident typeProposed action
app_crashdocker restart <container>
oomdocker restart <container>
restart_loopdocker restart <container>
dependency_failuredocker restart <container>
config_errordocker restart <container>
The action executor in routers/actions.py validates the command against a strict allowlist — only docker restart <name> and docker logs <name> are permitted, with the container name checked against ^[a-zA-Z0-9][a-zA-Z0-9_.-]{0,127}$.

Simulating an Incident

Use the following snippet to create a container that starts, prints log output, then exits with code 1 — triggering an app_crash classification.
1

Launch the crashing container

docker run -d --name demo-crash alpine sh -c "
  echo '[INFO] Starting service on port 8080'
  sleep 5
  echo '[FATAL] Could not recover connection. Shutting down.'
  exit 1
"
2

Send the alert to Sentinel

curl -s -X POST http://localhost:8000/api/alerts \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "status": "firing",
    "alerts": [{
      "status": "firing",
      "labels": {
        "alertname": "ContainerExitedUnexpectedly",
        "severity": "high",
        "name": "demo-crash",
        "container_runtime": "docker",
        "source_type": "container"
      },
      "annotations": {
        "summary": "demo-crash exited with code 1",
        "description": "Container demo-crash terminated unexpectedly after startup."
      },
      "startsAt": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"
    }]
  }'
3

Watch the agent investigate

The DockerAgent will:
  1. Call docker_inspect demo-crash to confirm exit_code=1, status=exited, and restart_count.
  2. Call docker_logs demo-crash to retrieve the [FATAL] line.
  3. Query the runbooks-docker ChromaDB collection for runbooks matching app_crash.
  4. Check episodic memory for similar past incidents.
  5. Produce a structured analysis and propose docker restart demo-crash.
The incident will appear in the dashboard at http://localhost:5173 with status Awaiting Approval.
4

Approve and verify

Approve the proposed docker restart demo-crash action in the dashboard. Sentinel executes it via subprocess, records the result, and moves the incident to Verifying before closing it as Resolved or Failed.

Investigation Flow

Alert received


Supervisor classifies incident_type (gpt-4o-mini, JSON mode)


DockerAgent.investigate(ctx)
      ├── recall_runbooks("app_crash ...", k=3)   → runbooks-docker collection
      ├── recall_similar_incidents("...", k=3)    → incidents-docker collection


ReAct loop (max 4 iterations)
      ├── docker_inspect <container>
      ├── docker_logs <container>
      └── (docker_stats / docker_ps if needed)


Final analysis (markdown)


Supervisor._build_proposed_action()  →  "docker restart demo-crash"


Status: awaiting_approval  →  human approves  →  verifying  →  resolved

Build docs developers (and LLMs) love