Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Pierrot-01/Hackathon_epis_2026/llms.txt

Use this file to discover all available pages before exploring further.

The risk classification engine in backend/classifier.py is the authoritative decision-maker of Vanguardia EPIS. It takes a single student record as input, evaluates each of three academic variables independently against frozen numeric thresholds, and then aggregates the results using a worst-case rule to produce an overall risk level. The entire process is deterministic: given the same input, the same output is guaranteed every time — no randomness, no AI inference, no network call.

Risk Levels

The engine produces exactly one of four risk levels. The emoji is the canonical representation used throughout the codebase and the frontend.
ConstantValueMeaning
NIVEL_BAJO🟢Student is on track — all indicators within expected ranges
NIVEL_MEDIO🟡One warning signal detected — teacher should monitor
NIVEL_ALTO🔴High risk — immediate teacher intervention recommended
NIVEL_INSUFICIENTENo data available — manual review required

Frozen Thresholds (§4.1)

These constants are marked congelados (frozen) in the source code. They must not be modified without a specification change.
# --- Umbrales congelados (§4.1) ---
UMBRAL_ASISTENCIA_BAJO = 90    # >= 90 → 🟢
UMBRAL_ASISTENCIA_MEDIO = 75   # 75-89 → 🟡, <75 → 🔴

UMBRAL_NOTAS_BAJO = 13         # >= 13 → 🟢
UMBRAL_NOTAS_MEDIO = 11        # 11-12 → 🟡, <11 → 🔴

Variable-Level Evaluation Rules

Each variable is evaluated in isolation before aggregation. Attendance (asistencia_pct) — numeric, range 0–100:
ValueLevel
≥ 90%🟢
75% – 89%🟡
< 75%🔴
Grades (notas_promedio) — vigesimal scale 0–20:
ValueLevel
≥ 13🟢
11 – 12🟡
< 11🔴
Participation (participacion) — categorical:
ValueLevel
"alta" or "media"🟢
"baja"🟡
By design (§4.1), participation never reaches 🔴 in isolation. Low participation is a supporting signal that can contribute to an overall 🔴 via the worst-case aggregation rule, but a student cannot be classified as high-risk on participation alone.

Worst-Case Aggregation Rule (Art. III §3.2.1)

After each variable is evaluated independently, the engine applies a single aggregation pass over the three results.
if any variable = 🔴           → overall = 🔴  (direct)
else:
  count variables where = 🟡 (señales_negativas)
  if señales_negativas == 0   → overall = 🟢
  if señales_negativas == 1   → overall = 🟡
  if señales_negativas >= 2   → overall = 🔴  (accumulation)
This rule implements a conservative safety bias: a student with two 🟡 signals — even if no single variable reaches the 🔴 threshold — is escalated to 🔴. The logic is drawn directly from clasificar_estudiante():
if NIVEL_ALTO in valores:
    nivel_global = NIVEL_ALTO
else:
    señales_negativas = sum(1 for v in valores if v == NIVEL_MEDIO)
    if señales_negativas == 0:
        nivel_global = NIVEL_BAJO
    elif señales_negativas == 1:
        nivel_global = NIVEL_MEDIO
    else:
        # 2+ señales negativas → 🔴 por acumulación
        nivel_global = NIVEL_ALTO

Missing Data Handling (Art. IX §9.3.2)

The engine handles missing or invalid data gracefully without raising exceptions.
1

normalizar_campo() validates all three fields

Before any evaluation occurs, normalizar_campo() converts invalid values to None:
  • asistencia_pct outside 0–100 → None
  • notas_promedio outside 0–20 → None
  • participacion not in {"alta", "media", "baja"}None
A warning is printed to the console but the pipeline continues.
2

Missing variables are excluded from aggregation

Variables with a None value are added to variables_faltantes[] and are not included in the worst-case aggregation. Only the variables that have valid data contribute to the overall risk level.
3

All three variables missing → ⚪ (Art. IX §9.3.2)

If all three fields are None after normalisation, the engine short-circuits and returns NIVEL_INSUFICIENTE. The motivo is set to "Las 3 variables carecen de dato — requiere revisión manual del docente".

normalizar_campo() Function

def normalizar_campo(estudiante: dict) -> dict:
    """
    Normaliza valores inválidos a None según §3.3.
    - asistencia_pct fuera de 0-100 → None
    - notas_promedio fuera de 0-20 → None
    - participacion con valor distinto a los 3 permitidos → None
    """
The function returns a shallow copy of the student dict with invalid fields replaced by None. It never mutates the original dict.

construir_motivo() — Audit Trail

Every classification result includes a human-readable audit trail. construir_motivo() generates one message per evaluated variable, and one message per missing variable:
# Example output for a student with 80% attendance (🟡), grade 9 (🔴), no participation data:
[
  "Asistencia: 80% — nivel 🟡 (umbral 🟢 es ≥90%)",
  "Notas: 9 — nivel 🔴 (umbral 🟢 es ≥13)",
  "Participación: sin dato — evaluado sin esta variable"
]
This list is passed directly to the AI prompt so Gemini can reference specific values, and it is also returned to the frontend for display — satisfying Art. II §2.3 (visible traceability).

clasificar_estudiante() — Full Signature

def clasificar_estudiante(estudiante: dict) -> dict:
    """
    Clasifica el nivel de riesgo de un estudiante.
    Implementa el pseudocódigo del §4.2 (Art. III §3.2.1 y §3.3).

    Retorna dict con:
        - nivel: str (🟢 / 🟡 / 🔴 / ⚪)
        - motivos: list[str]  ← trazabilidad visible (Art. II §2.3)
        - variables_faltantes: list[str]
    """
Input: A student dict containing at minimum the fields id, asistencia_pct, notas_promedio, and participacion. Any field may be None or absent. Output:
{
    "nivel": "🔴",                          # one of 🟢 🟡 🔴 ⚪
    "motivos": [                            # one entry per variable
        "Asistencia: 60% — nivel 🔴 (umbral 🟢 es ≥90%)",
        "Notas: 9 — nivel 🔴 (umbral 🟢 es ≥13)",
        "Participación: alta — nivel 🟢",
    ],
    "variables_faltantes": [],              # fields that were None/invalid
}

Calling the Classifier

from classifier import clasificar_estudiante

estudiante = {
    "id": "EST-007",
    "nombre": "Ana Quispe",
    "grado": "3.º de secundaria",
    "asistencia_pct": 68.0,
    "notas_promedio": 9.5,
    "participacion": "alta",
}

resultado = clasificar_estudiante(estudiante)
# resultado["nivel"]  → "🔴"
# resultado["motivos"] → [
#     "Asistencia: 68.0% — nivel 🔴 (umbral 🟢 es ≥90%)",
#     "Notas: 9.5 — nivel 🔴 (umbral 🟢 es ≥13)",
#     "Participación: alta — nivel 🟢",
# ]
# resultado["variables_faltantes"] → []

Built-In Test Suite

The classifier ships with a self-contained test runner in the if __name__ == "__main__" block. Run it directly to verify the logic at any time:
python3 backend/classifier.py

Test Cases

CaseAttendanceGradeParticipationExpectedRule triggered
Caso 1 — Todo bien95%15alta🟢All 🟢
Caso 2 — 1 señal 🟡80%15alta🟡1 × 🟡 (attendance)
Caso 3 — 2 señales 🟡 → 🔴80%11.5alta🔴2 × 🟡 accumulation
Caso 4 — 1 var 🔴 directa60%15alta🔴attendance = 🔴 directly
Caso 5 — Desempate peor caso95%9alta🔴grades = 🔴 directly
Caso 6 — Falta participacion95%15None🟢2 of 3 vars, both 🟢
Caso 7 — Faltan 2 varsNone15None🟢1 of 3 vars, 🟢
Caso 8 — Faltan 3 varsNoneNoneNoneArt. IX §9.3.2 short-circuit
Caso 9 — Límite exacto asist90%15alta🟢Boundary: exactly 90 = 🟢
Caso 10 — Límite exacto notas95%13alta🟢Boundary: exactly 13 = 🟢
Cases 9 and 10 explicitly test that boundary values (asistencia_pct == 90, notas_promedio == 13) are classified as 🟢, not 🟡. The thresholds use >= comparisons.

Build docs developers (and LLMs) love