Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/gcapella0/agente-inteligente-expedientes/llms.txt

Use this file to discover all available pages before exploring further.

Each of the four pipeline stages is implemented as a standalone Python class living under src/agents/. Every agent receives its dependencies through constructor injection (__init__), making them easy to unit-test and swap out in isolation. When the pipeline runs end-to-end, the output dict of each agent is passed directly as the input to the next, accumulating keys as it travels through the chain.
All agent classes follow the same convention: they expose a single primary method (run, process_directory, classify, or process) that accepts a data payload and returns an enriched dictionary. This uniformity is what makes composing them into a pipeline straightforward.

Agent Reference

Source: src/agents/watcher_agent.pyWatcherAgent runs in an infinite polling loop, connecting to an IMAP mailbox over SSL on each cycle, searching for emails that carry academic dossier attachments, and downloading those attachments to a per-teacher input folder.

IMAP connection

The agent connects with imaplib.IMAP4_SSL using credentials and host from environment variables. On each cycle it selects the configured folder (MAIL_FOLDER, default INBOX), searches for matching emails, downloads them, then disconnects cleanly before sleeping until the next poll interval.Because different IMAP providers handle keyword search differently, the agent tries three strategies in order:
TierMethodNotes
1Gmail X-GM-RAW query with subject:, body terms, and has:attachmentFastest; only works on Gmail
2Standard IMAP SUBJECT / BODY search for each ASCII keyword variantWorks on all providers; skips non-ASCII keywords
3Fallback SINCE search over the last 7 days with local keyword filteringCatches accented subjects (e.g., Currículum) that fail IMAP keyword search
Accent-insensitive matching is applied locally via unicodedata.normalize("NFKD") so that Expediente Docente and Expediente Docente are treated as equivalent.

Teacher name extraction

The teacher’s name is parsed from the email subject using the pattern:
{SUBJECT_KEYWORD} - Teacher Name
Supported separators: -, , , :. The extracted name becomes the subdirectory name under data/input/ (e.g., data/input/Juan_Perez/).

Accepted attachment formats

ATTACHMENT_EXTENSIONS = {".pdf", ".jpg", ".jpeg"}
Any attachment whose extension is not in this set is silently ignored. At least one qualifying attachment must be present or the email is discarded.

Configuration parameters

VariableDefaultDescription
MAIL_HOSTIMAP server hostname
MAIL_USERMailbox username / email address
MAIL_PASSMailbox password or app password
MAIL_FOLDERINBOXFolder to monitor
POLL_INTERVAL_SECONDSfrom configSeconds between polling cycles
SUBJECT_KEYWORDExpediente DocenteComma-separated subject keywords
BODY_KEYWORD(empty)Comma-separated body keywords
For Gmail accounts, generate an App Password under Google Account → Security → 2-Step Verification → App passwords. The standard account password will not work with IMAP when 2FA is enabled.
Source: src/agents/ocr_agent.pyOcrAgent wraps the OcrService (which in turn uses python-doctr[torch]) to extract text from every supported file found under data/input/. The docTR model is approximately 500 MB and is loaded once at service startup to avoid reloading it on every invocation.

Primary method

OcrAgent.process_directory(
    directory: Path | None = None,
    skip_hashes: set[str] | None = None,
) -> list[dict]
  • directory — root directory to scan; defaults to config.INPUT_DIR (data/input/)
  • skip_hashes — set of SHA-256 hex strings; files whose hash is in this set are skipped without running OCR
The method iterates over every subdirectory of the root (one per teacher) and processes each qualifying file inside. Results are returned as a flat list — one dict per processed file.

Supported file types

SUPPORTED_EXTENSIONS = {".pdf", ".jpg", ".jpeg", ".png"}
.txt email body files saved by WatcherAgent are intentionally excluded.

Per-file result dictionary

Each dict returned by process_directory contains:
KeyTypeDescription
archivo_pathPathAbsolute path to the source file
archivo_nombrestrFilename
carpeta_origenstrName of the subdirectory (teacher folder)
formatostrFile extension without dot (e.g., pdf)
tamano_bytesintRaw file size in bytes
hash_sha256strSHA-256 hex digest of file contents
ocr_resultadodict | NoneOCR output (see below) or None on failure

OCR result sub-dictionary (ocr_resultado)

KeyTypeDescription
texto_completostrFull extracted text as a single string
json_ligerodictStructured block representation optimised for LLM prompts
confianza_promediofloatAverage word-level confidence (0–1)
paginasintNumber of pages / images processed
idioma_detectadostrDetected language code
palabras_detectadasintTotal word count across all pages
The json_ligero field contains only the document’s structural blocks — lines, words, and bounding-box hints — rather than raw pixel data. Sending this compact representation to the LLM instead of texto_completo reduces token usage while preserving enough context for accurate classification.
Source: src/agents/classifier_agent.pyClassifierAgent sends OCR output to a large language model and receives a structured JSON response identifying the document type, extracting key fields, and flagging whether the document is valid for storage.

Primary method

ClassifierAgent.classify(ocr_result: dict) -> dict
Accepts the dict produced by OcrAgent for a single file and returns that same dict enriched with a clasificacion key.

LLM input selection

The agent prefers json_ligero (compact block structure) over texto_completo when both are available, because it is more token-efficient. If neither contains usable content, the agent short-circuits and returns valido=False immediately — no LLM call is made for blank documents.

clasificacion output keys

KeyTypeDescription
validoboolWhether the document is a recognisable academic document
tipoTipoDocumentoOne of the 22 document type values (see Data Models)
campos_extraidosdictStructured fields pulled from the document (name, cédula, dates, etc.)
confianza_clasificacionfloatLLM-reported confidence score, 0–1
razon_rechazostr | NoneHuman-readable rejection reason when valido=False
modelo_llmstrModel identifier used for this classification
tokens_usadosintTotal tokens consumed by the request

LLM temperature

The LLM is called with temperature=0.1 to keep classification deterministic and reproducible. Higher temperatures introduce unnecessary variability in document type predictions.
If the LLM service raises an exception (network error, rate limit, etc.), ClassifierAgent catches it and returns valido=False with the error message in razon_rechazo. The pipeline continues — StorageAgent will skip the document gracefully.
Source: src/agents/storage_agent.pyStorageAgent is the terminal stage of the pipeline. It takes the fully enriched result dict from ClassifierAgent and persists it to MongoDB, then moves the physical file to permanent storage under data/storage/{cedula}/.

Primary method

StorageAgent.process(classified_result: dict) -> dict
Returns {"exito": bool, "accion": "insert" | "skip" | "error", "docente_id": str | None, "documento_id": str | None}.

Seven-step processing flow

Step 1 — Validate document Checks clasificacion.valido == True. Documents flagged as invalid by the classifier are skipped with accion: "skip".Step 2 — Extract and normalise cédula The agent resolves the teacher’s national ID in priority order:
  1. cedula_titular field from campos_extraidos
  2. Derived from numero_rif (strips the check digit from Venezuelan RIF V-XXXXXXXX-D)
  3. MongoDB lookup by teacher folder name (exact match required to avoid ambiguity)
  4. Folder name used as a provisional identifier (flagged so it can be updated later)
Step 3 — Duplicate hash check Queries MongoDB documentos collection for an existing record with the same hash_sha256. Skips the document if found.Step 4 — Create or retrieve docente record Looks up the docentes collection by cédula. If not found, creates a new record using campos_extraidos and the folder name as a fallback for the teacher’s name. If a provisional record exists under the folder name, it is upgraded with the real cédula. Before inserting the document, the file is optionally compressed (PDF via Ghostscript with -dPDFSETTINGS=/ebook -r150x150; images via Pillow JPEG quality=85, optimize=True). If the compressed file is not smaller than the original, the original is kept.Step 5 — Insert document in MongoDB Constructs a DocumentoModel-compatible dict including ArchivoInfo, OcrInfo, ValidacionDocumento, and MetadataDocumento, then inserts it into the documentos collection.Step 6 — Update dossier completeness Calls MongoService.update_completitud(cedula) to recalculate the teacher’s completeness percentage based on the documents now present in the collection.Step 7 — Move file to storage Moves the file (compressed version if applicable) to data/storage/{cedula}/. If any MongoDB step fails before this point, the file is not moved, so the pipeline can retry it on the next cycle.
The file-move-last ordering is an intentional safety guarantee: if MongoDB is unavailable, the source file remains in data/input/ and will be picked up again on the next pipeline execution without any manual intervention.

Agent Configuration via API

Per-agent runtime parameters are stored in MongoDB and exposed through the configuration REST API:
GET  /config/agentes   # retrieve all agent configs
PUT  /config/agentes   # update all agent configs (full upsert)
Default values shipped with the system:
AgentParameterDefault
watchertimeout_segundos60
watcherretry_veces3
ocrtimeout_segundos120
ocrretry_veces2
classifiertemperatura0.7
classifiermax_tokens2000
storagetimeout_segundos30
Changes made through the API are applied on the next agent execution cycle without requiring a service restart. The MongoDB-backed configuration store makes it possible to tune agent behaviour from the web UI without touching environment variables or redeploying the application.

Build docs developers (and LLMs) love