The DockerAgent is Sentinel’s default runtime specialist for container incidents. It uses a bounded ReAct loop — up to four tool invocations — to gather live evidence from the Docker daemon, cross-reference theDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/nicolas344/Sentinel-SoftServe/llms.txt
Use this file to discover all available pages before exploring further.
runbooks-docker ChromaDB collection, and recall similar past incidents before producing a structured markdown analysis. When the investigation concludes, the Supervisor proposes a safe, whitelisted action for human approval.
Runtime Detection
The DockerAgent claims an alert when any of the following conditions is true (evaluated in order):- The alert label
container_runtime=dockeris explicitly set. - No other runtime label (
podman,kubernetes,containerd) is present andsource_typeis notdatabaseand a non-emptytargetexists.
postgres/ or mysql/ are always excluded.
Prerequisites
The Docker socket must be mounted into the backend container at If the socket is not accessible, all four tools return a descriptive error message rather than raising an exception — the agent continues reasoning with the Loki logs already present in its context window.
/var/run/docker.sock. The default docker-compose.yml already includes this bind mount:DOCKER_HOST environment variable (default: unix:///var/run/docker.sock). You can override this to point at a remote Docker daemon over TCP if needed.
Tools
All four tools are read-only. They never modify container state. The DockerAgent calls them only when the runbooks and Loki logs already in context are insufficient to produce a confident diagnosis.docker_inspect
Returns a JSON summary of a container’s current state:
status, exit_code, restart_count, memory/CPU limits, oom_killed flag, health check result, restart_policy, and timestamps.docker_logs
Fetches the last N log lines directly from the Docker daemon — more recent than what Loki may have indexed. Hard-capped at 200 lines.
docker_stats
Point-in-time resource snapshot: CPU percentage, memory usage vs. limit, memory percentage, and current PID count.
docker_ps
Lists all containers (running and stopped) with their name, short ID, status, and image. Useful for spotting related containers or recent crashes.
Tool Parameters
| Tool | Parameter | Type | Default | Description |
|---|---|---|---|---|
docker_inspect | container | string | — | Container name or ID prefix |
docker_logs | container | string | — | Container name or ID prefix |
docker_logs | tail | integer | 50 | Number of lines to return (max 200) |
docker_stats | container | string | — | Container name or ID prefix |
docker_ps | (none) | — | — | Lists all containers |
If the Docker socket is unavailable (e.g. the backend is running outside of Docker or without the bind mount), every tool returns a graceful fallback message such as:The agent then produces its analysis using only the Loki logs and runbook content already in its context.
Action Proposals
After investigation, the Supervisor’s_build_proposed_action function selects a safe remediation command based on the classified incident_type. All proposals require explicit human approval in the dashboard before execution.
- Restart proposal
- Logs proposal
A
docker restart <container> command is proposed for incident types that indicate the container process has stopped or is cycling:| Incident type | Proposed action |
|---|---|
app_crash | docker restart <container> |
oom | docker restart <container> |
restart_loop | docker restart <container> |
dependency_failure | docker restart <container> |
config_error | docker restart <container> |
routers/actions.py validates the command against a strict allowlist — only docker restart <name> and docker logs <name> are permitted, with the container name checked against ^[a-zA-Z0-9][a-zA-Z0-9_.-]{0,127}$.
Simulating an Incident
Use the following snippet to create a container that starts, prints log output, then exits with code 1 — triggering anapp_crash classification.
Watch the agent investigate
The DockerAgent will:
- Call
docker_inspect demo-crashto confirmexit_code=1,status=exited, andrestart_count. - Call
docker_logs demo-crashto retrieve the[FATAL]line. - Query the
runbooks-dockerChromaDB collection for runbooks matchingapp_crash. - Check episodic memory for similar past incidents.
- Produce a structured analysis and propose
docker restart demo-crash.
http://localhost:5173 with status Awaiting Approval.