The Agente Inteligente para Expedientes Docentes processes incoming academic documents through a linear four-stage pipeline. Each stage is handled by an autonomous Python agent: emails are monitored via IMAP, attachments are extracted and run through optical character recognition, the resulting text is classified by a large language model, and valid documents are persisted to MongoDB with their files moved to permanent storage. Agents can be run individually or chained together as a full pipeline triggered through the REST API.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/gcapella0/agente-inteligente-expedientes/llms.txt
Use this file to discover all available pages before exploring further.
Pipeline Diagram
Agents
Deep dive into each agent’s configuration, behavior, and inter-agent data flow.
Data Models
Pydantic schemas for the three MongoDB collections: docentes, documentos, and users.
LLM Providers
Configure OpenRouter or Ollama as the classification backend.
Agents API
REST endpoints for triggering agents and inspecting pipeline results.
Pipeline Execution Modes
The system supports two execution modes for every agent. Independiente — each agent runs in isolation and processes only its own stage. For example, running OcrAgent in independent mode scansdata/input/ and returns OCR results without invoking the classifier or storage layer. This is useful for testing, debugging, or reprocessing a specific stage without affecting the rest of the pipeline.
Pipeline completo — a single API call triggers the full chain: WatcherAgent → OcrAgent → ClassifierAgent → StorageAgent. The output of each stage is passed directly to the next as an enriched Python dictionary. Invoke the full pipeline with:
The
modo=pipeline query parameter can be passed to any agent endpoint. When set, the system runs the complete four-agent sequence regardless of which agent name appears in the URL path.Deduplication Strategy
The pipeline uses three independent layers of deduplication to avoid processing the same email or document more than once. Level 1 — IMAP UID tracking Every email successfully processed byWatcherAgent has its IMAP UID recorded in processed_uids.json. On each polling cycle the agent loads this file and skips any UID already present. This prevents re-downloading the same email across restarts.
Level 2 — Email content fingerprint
A SHA-256 hash is computed over the concatenation of sender address, subject, plain-text body, and the raw bytes of every attachment. This fingerprint is also stored in processed_uids.json. If the same email is forwarded again under a different UID, the fingerprint check catches it and the email is discarded before any attachment is saved.
Level 3 — File content hash
Before StorageAgent writes a document to MongoDB, it queries the documentos collection for an existing record with the same hash_sha256. If a match is found the document is skipped with accion: "skip", ensuring that the same physical document never appears twice in the database even if it arrived through multiple separate emails.
Compression
Before a file is written to permanent storage,StorageAgent attempts to reduce its size:
PDFs — Ghostscript
/ebook preset targets screen-quality output at 150 DPI, which is sufficient for scanned academic documents while significantly reducing file size.
Images — Pillow
JPEG files and PNG images are re-saved as JPEG with quality=85 and optimize=True using the Pillow library. PNG inputs are converted to RGB before encoding.
Fallback to original
After compression, the agent compares the sizes of the compressed file and the original. If the compressed file is not smaller, the original is used and the temporary compressed file is deleted. This ensures that already-optimised files are never made larger by the pipeline.