Incidents API: CRUD, Export, Runbooks, and Metrics Endpoints

The Incidents API is the core of Sentinel SoftServe. It exposes endpoints to list, create, and inspect incidents, manage their lifecycle status, generate and save post-mortems, export full incident bundles, query relevant runbooks from ChromaDB, find similar past incidents via episodic memory, and fetch live Prometheus metrics. All endpoints require a valid Supabase JWT Bearer token.

Incident Status Lifecycle

Incidents move through the following statuses as the LangGraph agent pipeline and human operators interact with them:

Status	Description
`detected`	Incident created; not yet analyzed
`investigating`	Agent is gathering logs and metrics
`analyzed`	Agent has produced a diagnosis and proposed action
`awaiting_approval`	Proposed action is pending human review
`executing_solution`	Action is actively being executed
`verifying`	Post-execution verification in progress
`resolved`	Incident fully resolved
`failed`	Execution failed or action was rejected

resolved and failed are terminal statuses. The active filter alias matches all non-terminal statuses.

GET /api/incidents

List all incidents with optional filtering and pagination. Auth required: Yes

source_type

string

Filter by incident source. One of container, database, or manual.

container_runtime

string

Filter by container runtime. One of docker, podman, or kubernetes.

severity

string

Filter by severity level. One of critical, high, medium, or low.

status

string

Filter by status. Pass an exact StatusType value, or the special alias active to return all incidents not in resolved or failed.

page

integer

default:"1"

Page number (1-based).

limit

integer

default:"20"

Number of incidents per page.

Response

data

array

Array of incident objects for the current page.

total

integer

Total number of incidents matching the applied filters.

page

integer

Current page number.

limit

integer

Page size used for this response.

pages

integer

Total number of pages (ceil(total / limit)).

Request
Response

curl -H "Authorization: Bearer $TOKEN" \
  "https://sentinel-softserve.onrender.com/api/incidents?status=active&severity=critical&page=1&limit=10"

{
  "data": [
    {
      "id": "3f7a1b2c-...",
      "title": "ContainerDown: my-service",
      "target": "my-service",
      "severity": "critical",
      "status": "awaiting_approval",
      "source_type": "container",
      "container_runtime": "docker",
      "proposed_action": "docker restart my-service",
      "created_at": "2024-06-01T12:00:00Z"
    }
  ],
  "total": 1,
  "page": 1,
  "limit": 10,
  "pages": 1
}

POST /api/incidents

Create a manual incident and immediately trigger the LangGraph analysis pipeline as a background task. Auth required: Yes
Status code: 201 Created

title

string

required

Short description of the incident. Maximum 200 characters.

target

string

required

The affected resource — a container name, database name, or service identifier. Maximum 100 characters.

severity

string

required

Severity level: critical, high, medium, or low.

source_type

string

default:"manual"

Origin of the incident: container, database, or manual.

description

string

Optional free-text description or initial log snippet. Maximum 5000 characters. Stored as the incident’s initial logs field.

When source_type is container, container_runtime defaults to docker. For database and manual sources, container_runtime is set to null.

Request
Response (201)

curl -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "API gateway returning 503",
    "target": "api-gateway",
    "severity": "high",
    "source_type": "container",
    "description": "Health check endpoint returning 503 since 14:32 UTC"
  }' \
  https://sentinel-softserve.onrender.com/api/incidents

{
  "id": "a1b2c3d4-e5f6-...",
  "title": "API gateway returning 503",
  "target": "api-gateway",
  "severity": "high",
  "status": "detected",
  "source_type": "container",
  "container_runtime": "docker",
  "logs": "Health check endpoint returning 503 since 14:32 UTC",
  "created_at": "2024-06-01T14:35:00Z"
}

GET /api/incidents/

Retrieve a single incident by its UUID. Auth required: Yes

incident_id

string

required

UUID of the incident.

Returns 404 Not Found with {"detail": "Incidente no encontrado"} if the ID does not exist.

Request
Response

curl -H "Authorization: Bearer $TOKEN" \
  https://sentinel-softserve.onrender.com/api/incidents/a1b2c3d4-e5f6-7890-abcd-ef1234567890

{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "title": "API gateway returning 503",
  "target": "api-gateway",
  "severity": "high",
  "status": "analyzed",
  "source_type": "container",
  "container_runtime": "docker",
  "incident_type": "container_crash",
  "agent_reasoning": "Container exited with OOM error...",
  "proposed_action": "docker restart api-gateway",
  "action_result": null,
  "action_error": null,
  "executed_at": null,
  "resolved_at": null,
  "created_at": "2024-06-01T14:35:00Z",
  "updated_at": "2024-06-01T14:36:10Z"
}

PATCH /api/incidents//status

Update the status of an incident. If the new status is resolved, a post-mortem generation job is queued as a background task. Auth required: Yes

incident_id

string

required

UUID of the incident.

status

string

required

The new status. Must be a valid StatusType: detected, investigating, analyzed, awaiting_approval, executing_solution, verifying, resolved, or failed.

Setting status to resolved also writes the current UTC timestamp to resolved_at and schedules post-mortem generation in the background.

Request
Response

curl -X PATCH \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"status": "resolved"}' \
  https://sentinel-softserve.onrender.com/api/incidents/a1b2c3d4-.../status

[
  {
    "id": "a1b2c3d4-...",
    "status": "resolved",
    "resolved_at": "2024-06-01T15:00:00Z"
  }
]

GET /api/incidents//post-mortem

Retrieve the post-mortem report for an incident. If the incident is resolved but no post-mortem has been saved yet, one is generated on demand synchronously. Auth required: Yes

incident_id

string

required

UUID of the incident.

incident_id

string

UUID of the incident.

status

string

Current incident status.

content

string

Markdown-formatted post-mortem content. Empty string if not yet available.

updated_at

string

ISO 8601 timestamp of the last update, or null.

generated

boolean

true if the post-mortem was generated on-demand during this request rather than loaded from a previously saved value.

Request
Response

curl -H "Authorization: Bearer $TOKEN" \
  https://sentinel-softserve.onrender.com/api/incidents/a1b2c3d4-.../post-mortem

{
  "incident_id": "a1b2c3d4-...",
  "status": "resolved",
  "content": "## Post-Mortem: API gateway returning 503\n\n**Summary**: ...",
  "updated_at": "2024-06-01T15:02:00Z",
  "generated": false
}

PUT /api/incidents//post-mortem

Save or overwrite the post-mortem content for an incident. Useful for human-edited post-mortems after review. Auth required: Yes

incident_id

string

required

UUID of the incident.

content

string

required

Markdown string. Minimum 1 character, maximum 100,000 characters.

Request
Response

curl -X PUT \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"content": "## Post-Mortem\n\nEdited by on-call engineer..."}' \
  https://sentinel-softserve.onrender.com/api/incidents/a1b2c3d4-.../post-mortem

{
  "incident_id": "a1b2c3d4-...",
  "saved": true
}

GET /api/incidents//export

Export a complete incident bundle for archiving or sharing. Supports JSON and Markdown formats. Auth required: Yes

incident_id

string

required

UUID of the incident.

format

string

default:"json"

Output format: json or markdown. Any other value returns 400 Bad Request.

The JSON export payload includes the following top-level keys:

Key	Description
`metadata`	Core incident fields (title, severity, status, timestamps)
`evidence.logs`	Raw log content captured during investigation
`evidence.metrics_snapshot`	Prometheus metrics snapshot taken at incident time
`evidence.agent_reasoning`	LangGraph agent’s step-by-step analysis
`timeline`	Ordered list of status transitions with timestamps
`decisions`	Proposed and executed actions with outcomes
`actions`	Detailed action execution records
`post_mortem`	Post-mortem Markdown content (if available)

Request (JSON)
Request (Markdown)
Response (JSON excerpt)

curl -H "Authorization: Bearer $TOKEN" \
  "https://sentinel-softserve.onrender.com/api/incidents/a1b2c3d4-.../export?format=json"

curl -H "Authorization: Bearer $TOKEN" \
  "https://sentinel-softserve.onrender.com/api/incidents/a1b2c3d4-.../export?format=markdown" \
  -o incident-report.md

{
  "metadata": {
    "id": "a1b2c3d4-...",
    "title": "API gateway returning 503",
    "severity": "high",
    "status": "resolved",
    "created_at": "2024-06-01T14:35:00Z",
    "resolved_at": "2024-06-01T15:00:00Z"
  },
  "evidence": {
    "logs": "...",
    "metrics_snapshot": {...},
    "agent_reasoning": "Container exited with OOM error..."
  },
  "timeline": [...],
  "decisions": {...},
  "actions": [...],
  "post_mortem": "## Post-Mortem: ..."
}

GET /api/incidents//runbooks

Retrieve up to 5 relevant runbooks from the ChromaDB vector store for this incident. Runbooks are matched using semantic similarity against the incident’s incident_type and title. Auth required: Yes

incident_id

string

required

UUID of the incident.

string

Optional custom query string to override the default search query ({incident_type} {title}). Useful for targeted runbook lookups.

The runbook collection queried is determined by the incident’s source_type and container_runtime:

Source	Runtime	ChromaDB Collection
`container`	`docker`	`runbooks-docker`
`container`	`podman`	`runbooks-podman`
`container`	`kubernetes`	`runbooks-kubernetes`
`database`	—	`runbooks-postgres`

title

string

Runbook title extracted from the RUNBOOK: <title> line in the document.

content

string

Full runbook text content.

Request
Response

curl -H "Authorization: Bearer $TOKEN" \
  "https://sentinel-softserve.onrender.com/api/incidents/a1b2c3d4-.../runbooks"

[
  {
    "title": "Container OOM Recovery",
    "content": "RUNBOOK: Container OOM Recovery\n\n1. Identify the container...\n2. Check memory limits..."
  },
  {
    "title": "Docker Restart Procedure",
    "content": "RUNBOOK: Docker Restart Procedure\n\n..."
  }
]

GET /api/incidents//similar

Find up to 5 similar past incidents using ChromaDB episodic memory. The query is built from the incident’s incident_type and title. The current incident is excluded from results. Auth required: Yes

incident_id

string

required

UUID of the incident.

Returns an array of incident objects. The collection queried mirrors the runbook collection selection logic (incidents-docker, incidents-postgres, etc.).

Request
Response

curl -H "Authorization: Bearer $TOKEN" \
  https://sentinel-softserve.onrender.com/api/incidents/a1b2c3d4-.../similar

[
  {
    "id": "9f8e7d6c-...",
    "title": "API gateway OOM crash",
    "severity": "high",
    "status": "resolved",
    "source_type": "container",
    "container_runtime": "docker",
    "created_at": "2024-05-15T09:22:00Z"
  }
]

GET /api/incidents//metrics

Fetch the current live Prometheus metrics for the incident’s target resource. If Prometheus is unreachable but a metrics_snapshot was captured at incident creation time, the snapshot is returned as a fallback. Auth required: Yes

incident_id

string

required

UUID of the incident.

Metric source depends on source_type:

`source_type`	Data source	Query
`database`	Prometheus → PostgreSQL exporter	Queries using `datname` extracted from target (`postgres/<datname>`)
`container`	Prometheus → cAdvisor	Queries using the container name as target

Returns 503 Service Unavailable if Prometheus is unreachable and no metrics_snapshot fallback exists.

Request
Response (container)
Response (503 – no fallback)

curl -H "Authorization: Bearer $TOKEN" \
  https://sentinel-softserve.onrender.com/api/incidents/a1b2c3d4-.../metrics

{
  "cpu_usage_percent": 94.2,
  "memory_usage_bytes": 536870912,
  "memory_limit_bytes": 536870912,
  "network_rx_bytes": 1024000,
  "network_tx_bytes": 512000
}

{
  "detail": "Métricas no disponibles: Connection refused"
}

Overview

Endpoints

Incidents API: CRUD, Export, Runbooks, and Metrics Endpoints

Incident Status Lifecycle

GET /api/incidents

POST /api/incidents

GET /api/incidents/

PATCH /api/incidents//status

GET /api/incidents//post-mortem

PUT /api/incidents//post-mortem

GET /api/incidents//export

GET /api/incidents//runbooks

GET /api/incidents//similar

GET /api/incidents//metrics

Build docs developers (and LLMs) love

Overview

Endpoints

Documentation Index

​Incident Status Lifecycle

​GET /api/incidents

​POST /api/incidents

​GET /api/incidents/

​PATCH /api/incidents//status

​GET /api/incidents//post-mortem

​PUT /api/incidents//post-mortem

​GET /api/incidents//export

​GET /api/incidents//runbooks

​GET /api/incidents//similar

​GET /api/incidents//metrics

Build docs developers (and LLMs) love

Incident Status Lifecycle

GET /api/incidents

POST /api/incidents

GET /api/incidents/

PATCH /api/incidents//status

GET /api/incidents//post-mortem

PUT /api/incidents//post-mortem

GET /api/incidents//export

GET /api/incidents//runbooks

GET /api/incidents//similar

GET /api/incidents//metrics