The system persists all state in three MongoDB collections. Each collection has a corresponding Pydantic model underDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/gcapella0/agente-inteligente-expedientes/llms.txt
Use this file to discover all available pages before exploring further.
src/models/ that enforces field types and validation rules both at write time and when data is returned from the API. Documents are stored as plain BSON objects — there is no ODM layer; the models are used for validation and serialisation only.
| Collection | Model | Purpose |
|---|---|---|
docentes | DocenteModel | One record per teacher; tracks identity, academic history, and dossier completeness |
documentos | DocumentoModel | One record per file processed by the pipeline; stores OCR output, classification result, and file metadata |
usuarios | UsuarioModel | API access accounts with role-based permissions |
All three models use
datetime.now() as the default factory for created_at / updated_at fields. The application updates updated_at manually on every write operation rather than relying on MongoDB triggers.DocenteModel
Source:src/models/docente.py · Collection: docentes
A DocenteModel document is created the first time a valid document belonging to a new teacher is processed by StorageAgent. It is progressively enriched as more documents arrive.
Top-level fields
Auto-generated sequential dossier number (e.g.,
EXP-0042). Assigned by MongoService.generate_expediente_numero() at record creation time.Lifecycle state of the dossier. One of:
activo, inactivo, en_revision, completo, incompleto.Embedded sub-document with all personal information about the teacher. See sub-model below.
Ordered list of academic degrees and certifications. Each entry records level, title, institution, country, and graduation date.
Institutional affiliation data: contract type, dedication, department, campus, and current role at UNEG.
Tracks which of the 10 required documents have been received and the overall completeness percentage.
Timestamp when the record was first created.
Timestamp of the most recent modification.
InfoDocente sub-model
Venezuelan national ID number (digits only after normalisation). Primary key for cross-collection lookups.
Given names of the teacher.
Family names of the teacher.
Date of birth in ISO 8601 format.
City or municipality of birth.
Nationality string (e.g.,
Venezolana).Gender code. Constrained to
M or F.One of:
soltero, casado, divorciado, viudo, union_libre.Nested contact details:
telefono_principal, telefono_secundario, email_personal, email_institucional.Residential address:
estado, municipio, direccion_completa, codigo_postal.FormacionAcademica sub-model
Each entry in theformacion_academica list represents one academic degree or certification.
Degree level. One of:
bachiller, tecnico, licenciatura, especializacion, maestria, doctorado, postdoctorado.Name of the degree or certification awarded.
Name of the awarding institution.
Country where the institution is located.
Graduation or certification date in ISO 8601 format.
VinculacionInstitucional sub-model
Describes the teacher’s current contract and role at the institution.Date the teacher first joined the institution.
Contract category. One of:
ordinario, contratado, jubilado, otro.Time commitment. One of:
exclusiva, tiempo_completo, medio_tiempo, convencional.Academic rank or salary category.
Academic department or faculty.
Campus or branch where the teacher is based.
Current administrative or academic role held at the institution.
Completitud sub-model
Integer from
0 to 100 representing how many of the 10 required documents have been received.Full list of required document types, each with a
tipo (string) and presente (bool) flag.Subset of
documentos_requeridos where presente == False. Used by the completeness display in the web UI.The 10 required documents for 100% completeness
A dossier reaches 100% (porcentaje = 100) only when all of the following document types are present:
DocumentoModel
Source:src/models/documento.py · Collection: documentos
One DocumentoModel record is created for every file that passes classification. It bundles the physical file metadata, the full OCR output, and the LLM classification result into a single document.
Top-level fields
MongoDB
_id of the parent DocenteModel record (stored as a string).Denormalised cédula of the teacher for fast queries without a join.
Classified document type. See the full list of 22 values below.
Original filename as received from the email attachment.
Optional human-readable description. Not populated by the pipeline; available for manual annotation.
Physical file metadata. See sub-model below.
OCR processing output including extracted text and confidence scores.
Reserved for future signature/seal detection.
procesado is always false in the current release.Human review state. Defaults to
pendiente on creation.Operational metadata:
metodo_carga (escaneo_automatico for pipeline-processed files), version_procesamiento, optional folio and pagina_expediente.ArchivoInfo sub-model
Final filesystem path after the file has been moved to
data/storage/{cedula}/.Filename as it arrived from the email attachment.
File extension without dot:
pdf, jpg, jpeg, or png.SHA-256 hex digest of the original file bytes. Used for deduplication.
Size of the original file in bytes.
Alias for
tamano_bytes; explicitly stored to retain the original size after compression.Actual size written to disk (compressed file size if compression succeeded).
Ratio of stored size to original size (e.g.,
0.62 means 38% size reduction). null if no compression was applied.Whether the stored file is a compressed copy of the original.
Compression method used:
ghostscript for PDFs, pillow for images, null if uncompressed.OcrInfo sub-model
Whether OCR has been run on this file.
OCR engine identifier. Always
"doctr" for pipeline-processed files.OCR engine version string (e.g.,
"0.9.0").Average word-level confidence from docTR (0.0–1.0).
Full concatenated text extracted from all pages.
Language code detected by the OCR engine.
Number of pages (PDF) or images processed.
Structured fields extracted by the LLM classifier (e.g.,
cedula_titular, nombre_titular, fecha_emision).ValidacionDocumento sub-model
Human review status. One of:
pendiente, aprobado, rechazado, requiere_revision.All 22 TipoDocumento values
The LLM is instructed to return one of these exact string values. If the document does not match any recognised type, it should return
"otro" with valido=false rather than inventing a new type.UsuarioModel
Source:src/models/usuario.py · Collection: usuarios
UsuarioModel controls access to the FastAPI REST API. Authentication is handled with JWT tokens; the model stores the bcrypt-hashed password rather than plaintext credentials.
Validated email address. Unique identifier for the user account.
Bcrypt hash of the user’s password. Never returned by any API endpoint.
Full display name of the user.
Access level. One of:
admin, usuario, sistema. Only admin users can create new accounts or modify agent configuration.Whether the account is enabled. Set to
false to deactivate without deleting the record.Timestamp of the most recent successful login. Updated on every
POST /auth/login call.Account creation timestamp.
Timestamp of the most recent account update.