AI-Powered Incident Analysis with Google Gemini in OpsMind

When OpsMind detects that a monitored service has dropped or degraded — a DROP_DETECTED trend — it does more than record the event. It immediately forwards the incident details to Google Gemini, framing the request as a Senior SRE diagnosing a live production alert. Gemini returns a structured JSON object containing a probable root cause and a concrete first remediation action. That response is then persisted as an AIInsight record linked to the affected monitor, giving operators an instant, AI-generated starting point for triage rather than a blank incident ticket.

Trigger Condition

AI analysis is initiated inside historyService.executeMonitorCheck immediately after a Log record is persisted. The check is a single condition:

if (analysisResult.trend === "DROP_DETECTED") {
  const errorDetails =
    analysisResult.error ||
    analysisResult.details ||
    "Timeout or without response";

  const aiDiagnosis = await analyzeIncident(
    monitor.name,
    monitor.url,
    errorDetails,
  );

  await prisma.aIInsight.create({
    data: {
      monitorId: monitor.id,
      analysis: aiDiagnosis.causa_probable,
      suggestion: aiDiagnosis.accion_recomendada,
      criticality: analysisResult.state === "DOWN" ? "CRITICAL" : "HIGH",
    },
  });
}

The errorDetails string passed to Gemini is resolved in priority order: first the raw error message returned by checker.js (e.g., "connect ECONNREFUSED 93.184.216.34:80"), then the human-readable details string from analyzer.js (e.g., "Error 500: The target server encountered an error or failed."), and finally the static fallback "Timeout or without response" when neither is available.

Gemini Model and Configuration

aiServices.analyzeIncident uses the @google/genai SDK with the following configuration:

const response = await ai.models.generateContent({
  model: "gemini-2.5-flash-lite",
  contents: prompt,
  config: {
    responseMimeType: "application/json",
    responseSchema: {
      type: Type.OBJECT,
      properties: {
        causa_probable: {
          type: Type.STRING,
          description:
            "Explicación técnica de 1 o 2 líneas sobre por qué ocurre este error.",
        },
        accion_recomendada: {
          type: Type.STRING,
          description:
            "El comando, log o servicio exacto que el equipo debe revisar primero para solucionarlo.",
        },
      },
      required: ["causa_probable", "accion_recomendada"],
    },
  },
});

Setting responseMimeType: "application/json" forces Gemini to return a clean JSON string with no Markdown fencing, making it safe to call JSON.parse(response.text) directly. The strict responseSchema guarantees that both causa_probable and accion_recomendada are always present in the response.

Response Schema

The structured output from Gemini always contains exactly two fields:

Field	Type	Description
`causa_probable`	`string`	A 1–2 line technical explanation of why the error most likely occurred. Stored as the `analysis` field on `AIInsight`.
`accion_recomendada`	`string`	The exact command, log file, or service the on-call engineer should inspect first. Stored as the `suggestion` field on `AIInsight`.

Prompt Template

The prompt sent to Gemini frames the request as a real incident alert received by a Senior Site Reliability Engineer:

Eres un Ingeniero Site Reliability (SRE) Senior diagnosticando una alerta de monitoreo.
Un servicio crítico de nuestra infraestructura acaba de reportar una caída o degradación.

Detalles del Incidente:
- Nombre del Servicio: ${monitorName}
- URL: ${url}
- Detalles del Error / Excepción: ${errorDetails}

Analiza la posible causa de este error y estructura tu diagnóstico de forma técnica y precisa.

The three variables — monitorName, url, and errorDetails — are injected at call time from the Monitor record and the analyzer result.

Storing the AIInsight

After a successful Gemini response, historyService persists the result using prisma.aIInsight.create:

await prisma.aIInsight.create({
  data: {
    monitorId: monitor.id,
    analysis: aiDiagnosis.causa_probable,
    suggestion: aiDiagnosis.accion_recomendada,
    criticality: analysisResult.state === "DOWN" ? "CRITICAL" : "HIGH",
  },
});

The criticality value is determined by the current ServiceStatus at the time of the drop: CRITICAL when the service is fully DOWN, and HIGH when it is DEGRADED.

Sample AIInsight Record

{
  "id": 42,
  "monitorId": 7,
  "analysis": "The service at the target URL is returning an HTTP 500, indicating an unhandled exception or crash in the application process. This is consistent with a recent deployment introducing a runtime error or an exhausted database connection pool.",
  "suggestion": "Check the application container logs immediately: `docker logs <container_name> --tail 100`. Look for stack traces or OOM kill events. Also verify the database connection pool status via your Prisma metrics endpoint.",
  "criticality": "CRITICAL",
  "createdAt": "2025-01-15T03:42:11.000Z"
}

Retry Strategy

The Gemini API can return transient errors under high load. analyzeIncident implements an automatic retry loop to handle these gracefully:

Detect Retryable Error

After a failed API call, the error is serialized to a string and inspected for the codes 429, 503, RESOURCE_EXHAUSTED, or UNAVAILABLE. Any of these signals a temporary capacity issue on the Gemini side.

Wait 10 Seconds

If the error is retryable and retries remain, the function waits 10 000 ms before attempting again.

await delay(10000);
return analyzeIncident(monitorName, url, errorDetails, retries - 1);

Retry Up to 3 Times

The default retries parameter is 3. Each recursive call decrements this counter. If all three retries are exhausted the function falls through to the fallback.

Return Fallback on Failure

If the error is not retryable, or all retries are exhausted, the function returns a static fallback object instead of throwing:

return {
  causa_probable: "Análisis de IA no disponible temporalmente.",
  accion_recomendada: "Revisar los logs del contenedor manualmente.",
};

AI analysis requires a valid GEMINI_API_KEY environment variable. Without it, the GoogleGenAI client will throw on initialization and the function will immediately enter the error path. The fallback object will be returned — meaning no exception is raised — but no meaningful AIInsight will be stored: the analysis and suggestion fields will contain the static fallback strings rather than a real Gemini diagnosis. Always confirm GEMINI_API_KEY is set in your production environment before relying on AI insights for incident response.

Get Started

Core Concepts

Guides

AI-Powered Incident Analysis with Google Gemini in OpsMind

Trigger Condition

Gemini Model and Configuration

Response Schema

Prompt Template

Storing the AIInsight

Sample AIInsight Record

Retry Strategy

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Documentation Index

​Trigger Condition

​Gemini Model and Configuration

​Response Schema

​Prompt Template

​Storing the AIInsight

​Sample AIInsight Record

​Retry Strategy

Build docs developers (and LLMs) love

Trigger Condition

Gemini Model and Configuration

Response Schema

Prompt Template

Storing the AIInsight

Sample AIInsight Record

Retry Strategy