Deterministic Risk Classification Engine — classifier.py

The risk classification engine in backend/classifier.py is the authoritative decision-maker of Vanguardia EPIS. It takes a single student record as input, evaluates each of three academic variables independently against frozen numeric thresholds, and then aggregates the results using a worst-case rule to produce an overall risk level. The entire process is deterministic: given the same input, the same output is guaranteed every time — no randomness, no AI inference, no network call.

Risk Levels

The engine produces exactly one of four risk levels. The emoji is the canonical representation used throughout the codebase and the frontend.

Constant	Value	Meaning
`NIVEL_BAJO`	`🟢`	Student is on track — all indicators within expected ranges
`NIVEL_MEDIO`	`🟡`	One warning signal detected — teacher should monitor
`NIVEL_ALTO`	`🔴`	High risk — immediate teacher intervention recommended
`NIVEL_INSUFICIENTE`	`⚪`	No data available — manual review required

Frozen Thresholds (§4.1)

These constants are marked congelados (frozen) in the source code. They must not be modified without a specification change.

# --- Umbrales congelados (§4.1) ---
UMBRAL_ASISTENCIA_BAJO = 90    # >= 90 → 🟢
UMBRAL_ASISTENCIA_MEDIO = 75   # 75-89 → 🟡, <75 → 🔴

UMBRAL_NOTAS_BAJO = 13         # >= 13 → 🟢
UMBRAL_NOTAS_MEDIO = 11        # 11-12 → 🟡, <11 → 🔴

Variable-Level Evaluation Rules

Each variable is evaluated in isolation before aggregation. Attendance (asistencia_pct) — numeric, range 0–100:

Value	Level
≥ 90%	🟢
75% – 89%	🟡
< 75%	🔴

Grades (notas_promedio) — vigesimal scale 0–20:

Value	Level
≥ 13	🟢
11 – 12	🟡
< 11	🔴

Participation (participacion) — categorical:

Value	Level
`"alta"` or `"media"`	🟢
`"baja"`	🟡

By design (§4.1), participation never reaches 🔴 in isolation. Low participation is a supporting signal that can contribute to an overall 🔴 via the worst-case aggregation rule, but a student cannot be classified as high-risk on participation alone.

Worst-Case Aggregation Rule (Art. III §3.2.1)

After each variable is evaluated independently, the engine applies a single aggregation pass over the three results.

if any variable = 🔴           → overall = 🔴  (direct)
else:
  count variables where = 🟡 (señales_negativas)
  if señales_negativas == 0   → overall = 🟢
  if señales_negativas == 1   → overall = 🟡
  if señales_negativas >= 2   → overall = 🔴  (accumulation)

This rule implements a conservative safety bias: a student with two 🟡 signals — even if no single variable reaches the 🔴 threshold — is escalated to 🔴. The logic is drawn directly from clasificar_estudiante():

if NIVEL_ALTO in valores:
    nivel_global = NIVEL_ALTO
else:
    señales_negativas = sum(1 for v in valores if v == NIVEL_MEDIO)
    if señales_negativas == 0:
        nivel_global = NIVEL_BAJO
    elif señales_negativas == 1:
        nivel_global = NIVEL_MEDIO
    else:
        # 2+ señales negativas → 🔴 por acumulación
        nivel_global = NIVEL_ALTO

Missing Data Handling (Art. IX §9.3.2)

The engine handles missing or invalid data gracefully without raising exceptions.

normalizar_campo() validates all three fields

Before any evaluation occurs, normalizar_campo() converts invalid values to None:

asistencia_pct outside 0–100 → None
notas_promedio outside 0–20 → None
participacion not in {"alta", "media", "baja"} → None

A warning is printed to the console but the pipeline continues.

Missing variables are excluded from aggregation

Variables with a None value are added to variables_faltantes[] and are not included in the worst-case aggregation. Only the variables that have valid data contribute to the overall risk level.

All three variables missing → ⚪ (Art. IX §9.3.2)

If all three fields are None after normalisation, the engine short-circuits and returns NIVEL_INSUFICIENTE. The motivo is set to "Las 3 variables carecen de dato — requiere revisión manual del docente".

`normalizar_campo()` Function

def normalizar_campo(estudiante: dict) -> dict:
    """
    Normaliza valores inválidos a None según §3.3.
    - asistencia_pct fuera de 0-100 → None
    - notas_promedio fuera de 0-20 → None
    - participacion con valor distinto a los 3 permitidos → None
    """

The function returns a shallow copy of the student dict with invalid fields replaced by None. It never mutates the original dict.

`construir_motivo()` — Audit Trail

Every classification result includes a human-readable audit trail. construir_motivo() generates one message per evaluated variable, and one message per missing variable:

# Example output for a student with 80% attendance (🟡), grade 9 (🔴), no participation data:
[
  "Asistencia: 80% — nivel 🟡 (umbral 🟢 es ≥90%)",
  "Notas: 9 — nivel 🔴 (umbral 🟢 es ≥13)",
  "Participación: sin dato — evaluado sin esta variable"
]

This list is passed directly to the AI prompt so Gemini can reference specific values, and it is also returned to the frontend for display — satisfying Art. II §2.3 (visible traceability).

`clasificar_estudiante()` — Full Signature

def clasificar_estudiante(estudiante: dict) -> dict:
    """
    Clasifica el nivel de riesgo de un estudiante.
    Implementa el pseudocódigo del §4.2 (Art. III §3.2.1 y §3.3).

    Retorna dict con:
        - nivel: str (🟢 / 🟡 / 🔴 / ⚪)
        - motivos: list[str]  ← trazabilidad visible (Art. II §2.3)
        - variables_faltantes: list[str]
    """

Input: A student dict containing at minimum the fields id, asistencia_pct, notas_promedio, and participacion. Any field may be None or absent. Output:

{
    "nivel": "🔴",                          # one of 🟢 🟡 🔴 ⚪
    "motivos": [                            # one entry per variable
        "Asistencia: 60% — nivel 🔴 (umbral 🟢 es ≥90%)",
        "Notas: 9 — nivel 🔴 (umbral 🟢 es ≥13)",
        "Participación: alta — nivel 🟢",
    ],
    "variables_faltantes": [],              # fields that were None/invalid
}

Calling the Classifier

from classifier import clasificar_estudiante

estudiante = {
    "id": "EST-007",
    "nombre": "Ana Quispe",
    "grado": "3.º de secundaria",
    "asistencia_pct": 68.0,
    "notas_promedio": 9.5,
    "participacion": "alta",
}

resultado = clasificar_estudiante(estudiante)
# resultado["nivel"]  → "🔴"
# resultado["motivos"] → [
#     "Asistencia: 68.0% — nivel 🔴 (umbral 🟢 es ≥90%)",
#     "Notas: 9.5 — nivel 🔴 (umbral 🟢 es ≥13)",
#     "Participación: alta — nivel 🟢",
# ]
# resultado["variables_faltantes"] → []

Built-In Test Suite

The classifier ships with a self-contained test runner in the if __name__ == "__main__" block. Run it directly to verify the logic at any time:

python3 backend/classifier.py

Test Cases

Case	Attendance	Grade	Participation	Expected	Rule triggered
Caso 1 — Todo bien	95%	15	alta	🟢	All 🟢
Caso 2 — 1 señal 🟡	80%	15	alta	🟡	1 × 🟡 (attendance)
Caso 3 — 2 señales 🟡 → 🔴	80%	11.5	alta	🔴	2 × 🟡 accumulation
Caso 4 — 1 var 🔴 directa	60%	15	alta	🔴	attendance = 🔴 directly
Caso 5 — Desempate peor caso	95%	9	alta	🔴	grades = 🔴 directly
Caso 6 — Falta participacion	95%	15	`None`	🟢	2 of 3 vars, both 🟢
Caso 7 — Faltan 2 vars	`None`	15	`None`	🟢	1 of 3 vars, 🟢
Caso 8 — Faltan 3 vars	`None`	`None`	`None`	⚪	Art. IX §9.3.2 short-circuit
Caso 9 — Límite exacto asist	90%	15	alta	🟢	Boundary: exactly 90 = 🟢
Caso 10 — Límite exacto notas	95%	13	alta	🟢	Boundary: exactly 13 = 🟢

Cases 9 and 10 explicitly test that boundary values (asistencia_pct == 90, notas_promedio == 13) are classified as 🟢, not 🟡. The thresholds use >= comparisons.

Introducción

Arquitectura

Interfaz Web

Despliegue

Deterministic Risk Classification Engine — classifier.py

Risk Levels

Frozen Thresholds (§4.1)

Variable-Level Evaluation Rules

Worst-Case Aggregation Rule (Art. III §3.2.1)

Missing Data Handling (Art. IX §9.3.2)

`normalizar_campo()` Function

`construir_motivo()` — Audit Trail

`clasificar_estudiante()` — Full Signature

Calling the Classifier

Built-In Test Suite

Test Cases

Build docs developers (and LLMs) love

Introducción

Arquitectura

Interfaz Web

Despliegue

Documentation Index

​Risk Levels

​Frozen Thresholds (§4.1)

​Variable-Level Evaluation Rules

​Worst-Case Aggregation Rule (Art. III §3.2.1)

​Missing Data Handling (Art. IX §9.3.2)

​normalizar_campo() Function

​construir_motivo() — Audit Trail

​clasificar_estudiante() — Full Signature

​Calling the Classifier

​Built-In Test Suite

​Test Cases

Build docs developers (and LLMs) love

Risk Levels

Frozen Thresholds (§4.1)

Variable-Level Evaluation Rules

Worst-Case Aggregation Rule (Art. III §3.2.1)

Missing Data Handling (Art. IX §9.3.2)

`normalizar_campo()` Function

`construir_motivo()` — Audit Trail

`clasificar_estudiante()` — Full Signature

Calling the Classifier

Built-In Test Suite

Test Cases