Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/LENINMORENO13/OpsMind/llms.txt

Use this file to discover all available pages before exploring further.

When OpsMind detects that a monitored service has dropped or degraded — a DROP_DETECTED trend — it does more than record the event. It immediately forwards the incident details to Google Gemini, framing the request as a Senior SRE diagnosing a live production alert. Gemini returns a structured JSON object containing a probable root cause and a concrete first remediation action. That response is then persisted as an AIInsight record linked to the affected monitor, giving operators an instant, AI-generated starting point for triage rather than a blank incident ticket.

Trigger Condition

AI analysis is initiated inside historyService.executeMonitorCheck immediately after a Log record is persisted. The check is a single condition:
if (analysisResult.trend === "DROP_DETECTED") {
  const errorDetails =
    analysisResult.error ||
    analysisResult.details ||
    "Timeout or without response";

  const aiDiagnosis = await analyzeIncident(
    monitor.name,
    monitor.url,
    errorDetails,
  );

  await prisma.aIInsight.create({
    data: {
      monitorId: monitor.id,
      analysis: aiDiagnosis.causa_probable,
      suggestion: aiDiagnosis.accion_recomendada,
      criticality: analysisResult.state === "DOWN" ? "CRITICAL" : "HIGH",
    },
  });
}
The errorDetails string passed to Gemini is resolved in priority order: first the raw error message returned by checker.js (e.g., "connect ECONNREFUSED 93.184.216.34:80"), then the human-readable details string from analyzer.js (e.g., "Error 500: The target server encountered an error or failed."), and finally the static fallback "Timeout or without response" when neither is available.

Gemini Model and Configuration

aiServices.analyzeIncident uses the @google/genai SDK with the following configuration:
const response = await ai.models.generateContent({
  model: "gemini-2.5-flash-lite",
  contents: prompt,
  config: {
    responseMimeType: "application/json",
    responseSchema: {
      type: Type.OBJECT,
      properties: {
        causa_probable: {
          type: Type.STRING,
          description:
            "Explicación técnica de 1 o 2 líneas sobre por qué ocurre este error.",
        },
        accion_recomendada: {
          type: Type.STRING,
          description:
            "El comando, log o servicio exacto que el equipo debe revisar primero para solucionarlo.",
        },
      },
      required: ["causa_probable", "accion_recomendada"],
    },
  },
});
Setting responseMimeType: "application/json" forces Gemini to return a clean JSON string with no Markdown fencing, making it safe to call JSON.parse(response.text) directly. The strict responseSchema guarantees that both causa_probable and accion_recomendada are always present in the response.

Response Schema

The structured output from Gemini always contains exactly two fields:
FieldTypeDescription
causa_probablestringA 1–2 line technical explanation of why the error most likely occurred. Stored as the analysis field on AIInsight.
accion_recomendadastringThe exact command, log file, or service the on-call engineer should inspect first. Stored as the suggestion field on AIInsight.

Prompt Template

The prompt sent to Gemini frames the request as a real incident alert received by a Senior Site Reliability Engineer:
Eres un Ingeniero Site Reliability (SRE) Senior diagnosticando una alerta de monitoreo.
Un servicio crítico de nuestra infraestructura acaba de reportar una caída o degradación.

Detalles del Incidente:
- Nombre del Servicio: ${monitorName}
- URL: ${url}
- Detalles del Error / Excepción: ${errorDetails}

Analiza la posible causa de este error y estructura tu diagnóstico de forma técnica y precisa.
The three variables — monitorName, url, and errorDetails — are injected at call time from the Monitor record and the analyzer result.

Storing the AIInsight

After a successful Gemini response, historyService persists the result using prisma.aIInsight.create:
await prisma.aIInsight.create({
  data: {
    monitorId: monitor.id,
    analysis: aiDiagnosis.causa_probable,
    suggestion: aiDiagnosis.accion_recomendada,
    criticality: analysisResult.state === "DOWN" ? "CRITICAL" : "HIGH",
  },
});
The criticality value is determined by the current ServiceStatus at the time of the drop: CRITICAL when the service is fully DOWN, and HIGH when it is DEGRADED.

Sample AIInsight Record

{
  "id": 42,
  "monitorId": 7,
  "analysis": "The service at the target URL is returning an HTTP 500, indicating an unhandled exception or crash in the application process. This is consistent with a recent deployment introducing a runtime error or an exhausted database connection pool.",
  "suggestion": "Check the application container logs immediately: `docker logs <container_name> --tail 100`. Look for stack traces or OOM kill events. Also verify the database connection pool status via your Prisma metrics endpoint.",
  "criticality": "CRITICAL",
  "createdAt": "2025-01-15T03:42:11.000Z"
}

Retry Strategy

The Gemini API can return transient errors under high load. analyzeIncident implements an automatic retry loop to handle these gracefully:
1

Detect Retryable Error

After a failed API call, the error is serialized to a string and inspected for the codes 429, 503, RESOURCE_EXHAUSTED, or UNAVAILABLE. Any of these signals a temporary capacity issue on the Gemini side.
2

Wait 10 Seconds

If the error is retryable and retries remain, the function waits 10 000 ms before attempting again.
await delay(10000);
return analyzeIncident(monitorName, url, errorDetails, retries - 1);
3

Retry Up to 3 Times

The default retries parameter is 3. Each recursive call decrements this counter. If all three retries are exhausted the function falls through to the fallback.
4

Return Fallback on Failure

If the error is not retryable, or all retries are exhausted, the function returns a static fallback object instead of throwing:
return {
  causa_probable: "Análisis de IA no disponible temporalmente.",
  accion_recomendada: "Revisar los logs del contenedor manualmente.",
};
AI analysis requires a valid GEMINI_API_KEY environment variable. Without it, the GoogleGenAI client will throw on initialization and the function will immediately enter the error path. The fallback object will be returned — meaning no exception is raised — but no meaningful AIInsight will be stored: the analysis and suggestion fields will contain the static fallback strings rather than a real Gemini diagnosis. Always confirm GEMINI_API_KEY is set in your production environment before relying on AI insights for incident response.

Build docs developers (and LLMs) love