When OpsMind detects that a monitored service has dropped or degraded — aDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/LENINMORENO13/OpsMind/llms.txt
Use this file to discover all available pages before exploring further.
DROP_DETECTED trend — it does more than record the event. It immediately forwards the incident details to Google Gemini, framing the request as a Senior SRE diagnosing a live production alert. Gemini returns a structured JSON object containing a probable root cause and a concrete first remediation action. That response is then persisted as an AIInsight record linked to the affected monitor, giving operators an instant, AI-generated starting point for triage rather than a blank incident ticket.
Trigger Condition
AI analysis is initiated insidehistoryService.executeMonitorCheck immediately after a Log record is persisted. The check is a single condition:
The
errorDetails string passed to Gemini is resolved in priority order: first the raw error message returned by checker.js (e.g., "connect ECONNREFUSED 93.184.216.34:80"), then the human-readable details string from analyzer.js (e.g., "Error 500: The target server encountered an error or failed."), and finally the static fallback "Timeout or without response" when neither is available.Gemini Model and Configuration
aiServices.analyzeIncident uses the @google/genai SDK with the following configuration:
responseMimeType: "application/json" forces Gemini to return a clean JSON string with no Markdown fencing, making it safe to call JSON.parse(response.text) directly. The strict responseSchema guarantees that both causa_probable and accion_recomendada are always present in the response.
Response Schema
The structured output from Gemini always contains exactly two fields:| Field | Type | Description |
|---|---|---|
causa_probable | string | A 1–2 line technical explanation of why the error most likely occurred. Stored as the analysis field on AIInsight. |
accion_recomendada | string | The exact command, log file, or service the on-call engineer should inspect first. Stored as the suggestion field on AIInsight. |
Prompt Template
The prompt sent to Gemini frames the request as a real incident alert received by a Senior Site Reliability Engineer:monitorName, url, and errorDetails — are injected at call time from the Monitor record and the analyzer result.
Storing the AIInsight
After a successful Gemini response,historyService persists the result using prisma.aIInsight.create:
criticality value is determined by the current ServiceStatus at the time of the drop: CRITICAL when the service is fully DOWN, and HIGH when it is DEGRADED.
Sample AIInsight Record
Retry Strategy
The Gemini API can return transient errors under high load.analyzeIncident implements an automatic retry loop to handle these gracefully:
Detect Retryable Error
After a failed API call, the error is serialized to a string and inspected for the codes
429, 503, RESOURCE_EXHAUSTED, or UNAVAILABLE. Any of these signals a temporary capacity issue on the Gemini side.Wait 10 Seconds
If the error is retryable and retries remain, the function waits 10 000 ms before attempting again.
Retry Up to 3 Times
The default
retries parameter is 3. Each recursive call decrements this counter. If all three retries are exhausted the function falls through to the fallback.