Understanding OpsMind's Service Status and Trend System

Every health check that OpsMind executes produces two classification values: a ServiceStatus that describes the current condition of the monitored service in isolation, and a TrendStatus that places that condition in the context of the previous check result. Together, these two signals give operators an instantaneous picture of whether a service is healthy, degrading, or recovering — and they determine whether the platform escalates to AI-powered incident analysis. This page documents each possible value, the exact logic used to derive them, and how they map to criticality levels and alerting.

ServiceStatus

ServiceStatus describes the outcome of a single HTTP check. It is computed by analyzer.analyzeStatus immediately after checker.check returns.

UP

The service returned an HTTP 2xx response within 1 500 ms. Both availability and performance are healthy.

DEGRADED

The service returned an HTTP 2xx response, but responseTime > 1500 ms. The service is reachable but performing below the acceptable threshold.

DOWN

The service did not respond or returned a non-2xx error (online === false). This includes network timeouts, connection refused, and HTTP error responses.

PENDING

No check data exists yet for this monitor. Assigned as the default lastStatus when a monitor is first created, and used as the previousState baseline on the very first execution.

The status is derived from these conditions in priority order:

if (!currentCheck.online) {
  currentState = "DOWN";
} else if (currentCheck.responseTime > 1500) {
  currentState = "DEGRADED";
} else {
  currentState = "UP";
}

TrendStatus

TrendStatus describes how the service’s health has changed relative to its last recorded state. It is computed by comparing currentState against previousState (the state field of the most recent Log record, or "PENDING" if no record exists).

STABLE

No significant health change. The service is continuing in the same healthy or degraded condition as before.

DROP_DETECTED

A worsening transition was detected. The service moved from UP → DEGRADED, UP → DOWN, or DEGRADED → DOWN. This is the trigger for AI incident analysis.

RECOVERED

A recovery transition was detected. The service moved from DOWN → UP, DOWN → DEGRADED, or DEGRADED → UP.

OFFLINE

The service was already DOWN and remains DOWN. Also assigned when the very first check on a monitor immediately returns DOWN (previous state PENDING → current state DOWN).

State Transition Table

The table below documents all 12 transitions derived from the logic in analyzer.js. Every combination of previous and current state maps to exactly one trend value.

Previous State	Current State	Trend
`PENDING`	`UP`	`STABLE`
`PENDING`	`DEGRADED`	`STABLE`
`PENDING`	`DOWN`	`OFFLINE`
`UP`	`UP`	`STABLE`
`UP`	`DEGRADED`	`DROP_DETECTED`
`UP`	`DOWN`	`DROP_DETECTED`
`DEGRADED`	`UP`	`RECOVERED`
`DEGRADED`	`DEGRADED`	`STABLE`
`DEGRADED`	`DOWN`	`DROP_DETECTED`
`DOWN`	`UP`	`RECOVERED`
`DOWN`	`DEGRADED`	`RECOVERED`
`DOWN`	`DOWN`	`OFFLINE`

DROP_DETECTED is the sole trigger for AI incident analysis. Whenever this trend value is recorded, historyService.executeMonitorCheck immediately calls aiServices.analyzeIncident and persists an AIInsight record with Gemini’s diagnosis. No other trend value initiates an AI call.

CriticalityLevel

When an AIInsight is created following a DROP_DETECTED event, its criticality field is set based on the current state at the time of detection:

Current State at Drop	Criticality Assigned
`DOWN`	`CRITICAL`
`DEGRADED`	`HIGH`

The CriticalityLevel enum also defines LOW and MEDIUM values in the schema, leaving room for future expansion — for example, manual operator overrides or graduated alerting tiers. As of the current implementation they are not assigned automatically.

criticality: analysisResult.state === "DOWN" ? "CRITICAL" : "HIGH"

Use the criticality field on AIInsight records to filter and prioritize incidents in your alerting pipeline. A CRITICAL insight means the service is completely unreachable; a HIGH insight means it is degraded and slowing.

HTTP Status Code Mappings

analyzer.analyzeStatus resolves a human-readable description for the most common HTTP status codes encountered during monitoring. These descriptions are included in the analysis result and used as errorDetails context when the AI layer is invoked.

HTTP Status Code	Message	Details
`200`	OK - Service Operational	Successful response. The service responded correctly.
`404`	Resource Not Found	Error 404: The requested resource could not be found.
`500`	Internal Server Error	Error 500: The target server encountered an error or failed.
`0`	No Response	The service did not respond or there is a connection timeout.

Status code 0 is returned by checker.js when axios throws a network-level error with no HTTP response — for example, a refused connection or a 5-second timeout expiry. It is not a real HTTP code; it is a sentinel value indicating total unreachability.

Any status code not in this table resolves to "Unknown code" with a detail string of "The code was received <status>", ensuring that unusual responses are still logged without crashing the analysis pipeline.

Get Started

Core Concepts

Guides

Understanding OpsMind's Service Status and Trend System

ServiceStatus

UP

DEGRADED

DOWN

PENDING

TrendStatus

STABLE

DROP_DETECTED

RECOVERED

OFFLINE

State Transition Table

CriticalityLevel

HTTP Status Code Mappings

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Documentation Index

​ServiceStatus

UP

DEGRADED

DOWN

PENDING

​TrendStatus

STABLE

DROP_DETECTED

RECOVERED

OFFLINE

​State Transition Table

​CriticalityLevel

​HTTP Status Code Mappings

Build docs developers (and LLMs) love

ServiceStatus

TrendStatus

State Transition Table

CriticalityLevel

HTTP Status Code Mappings