Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/gcapella0/agente-inteligente-expedientes/llms.txt

Use this file to discover all available pages before exploring further.

The system persists all state in three MongoDB collections. Each collection has a corresponding Pydantic model under src/models/ that enforces field types and validation rules both at write time and when data is returned from the API. Documents are stored as plain BSON objects — there is no ODM layer; the models are used for validation and serialisation only.
CollectionModelPurpose
docentesDocenteModelOne record per teacher; tracks identity, academic history, and dossier completeness
documentosDocumentoModelOne record per file processed by the pipeline; stores OCR output, classification result, and file metadata
usuariosUsuarioModelAPI access accounts with role-based permissions
All three models use datetime.now() as the default factory for created_at / updated_at fields. The application updates updated_at manually on every write operation rather than relying on MongoDB triggers.

DocenteModel

Source: src/models/docente.py · Collection: docentes A DocenteModel document is created the first time a valid document belonging to a new teacher is processed by StorageAgent. It is progressively enriched as more documents arrive.

Top-level fields

expediente_numero
str | None
Auto-generated sequential dossier number (e.g., EXP-0042). Assigned by MongoService.generate_expediente_numero() at record creation time.
status
string
default:"incompleto"
Lifecycle state of the dossier. One of: activo, inactivo, en_revision, completo, incompleto.
docente
InfoDocente
required
Embedded sub-document with all personal information about the teacher. See sub-model below.
formacion_academica
list[FormacionAcademica]
default:"[]"
Ordered list of academic degrees and certifications. Each entry records level, title, institution, country, and graduation date.
vinculacion_institucional
VinculacionInstitucional
Institutional affiliation data: contract type, dedication, department, campus, and current role at UNEG.
completitud
Completitud
Tracks which of the 10 required documents have been received and the overall completeness percentage.
created_at
datetime
Timestamp when the record was first created.
updated_at
datetime
Timestamp of the most recent modification.

InfoDocente sub-model

cedula
str
required
Venezuelan national ID number (digits only after normalisation). Primary key for cross-collection lookups.
nombres
str
required
Given names of the teacher.
apellidos
str
required
Family names of the teacher.
fecha_nacimiento
date | None
Date of birth in ISO 8601 format.
lugar_nacimiento
str | None
City or municipality of birth.
nacionalidad
str | None
Nationality string (e.g., Venezolana).
genero
"M" | "F" | None
Gender code. Constrained to M or F.
estado_civil
string | None
One of: soltero, casado, divorciado, viudo, union_libre.
contacto
ContactoDocente
Nested contact details: telefono_principal, telefono_secundario, email_personal, email_institucional.
direccion
DireccionDocente
Residential address: estado, municipio, direccion_completa, codigo_postal.

FormacionAcademica sub-model

Each entry in the formacion_academica list represents one academic degree or certification.
nivel
string | None
Degree level. One of: bachiller, tecnico, licenciatura, especializacion, maestria, doctorado, postdoctorado.
titulo_obtenido
str | None
Name of the degree or certification awarded.
institucion
str | None
Name of the awarding institution.
pais
str | None
Country where the institution is located.
fecha_grado
date | None
Graduation or certification date in ISO 8601 format.

VinculacionInstitucional sub-model

Describes the teacher’s current contract and role at the institution.
fecha_ingreso
date | None
Date the teacher first joined the institution.
tipo_contratacion
string | None
Contract category. One of: ordinario, contratado, jubilado, otro.
dedicacion
string | None
Time commitment. One of: exclusiva, tiempo_completo, medio_tiempo, convencional.
categoria
str | None
Academic rank or salary category.
departamento
str | None
Academic department or faculty.
sede
str | None
Campus or branch where the teacher is based.
cargo_actual
str | None
Current administrative or academic role held at the institution.

Completitud sub-model

porcentaje
int
Integer from 0 to 100 representing how many of the 10 required documents have been received.
documentos_requeridos
list[DocumentoRequerido]
Full list of required document types, each with a tipo (string) and presente (bool) flag.
documentos_faltantes
list[str]
Subset of documentos_requeridos where presente == False. Used by the completeness display in the web UI.

The 10 required documents for 100% completeness

A dossier reaches 100% (porcentaje = 100) only when all of the following document types are present:
cedula_identidad
partida_nacimiento
rif
titulo_bachiller
certificado_notas_bachillerato
titulo_universitario
certificado_notas_pregrado
fondo_negro_titulo
acta_grado
resolucion_nombramiento
completitud is recalculated automatically by MongoService.update_completitud(cedula) every time StorageAgent successfully inserts a new document, so the percentage always reflects the current state of the documentos collection.

DocumentoModel

Source: src/models/documento.py · Collection: documentos One DocumentoModel record is created for every file that passes classification. It bundles the physical file metadata, the full OCR output, and the LLM classification result into a single document.

Top-level fields

docente_id
str
required
MongoDB _id of the parent DocenteModel record (stored as a string).
docente_cedula
str
required
Denormalised cédula of the teacher for fast queries without a join.
tipo
TipoDocumento
default:"otro"
Classified document type. See the full list of 22 values below.
nombre
str
required
Original filename as received from the email attachment.
descripcion
str | None
Optional human-readable description. Not populated by the pipeline; available for manual annotation.
archivo
ArchivoInfo
required
Physical file metadata. See sub-model below.
ocr
OcrInfo
OCR processing output including extracted text and confidence scores.
verificacion_visual
VerificacionVisual
Reserved for future signature/seal detection. procesado is always false in the current release.
validacion
ValidacionDocumento
Human review state. Defaults to pendiente on creation.
metadata
MetadataDocumento
Operational metadata: metodo_carga (escaneo_automatico for pipeline-processed files), version_procesamiento, optional folio and pagina_expediente.

ArchivoInfo sub-model

ruta
str
required
Final filesystem path after the file has been moved to data/storage/{cedula}/.
nombre_original
str
required
Filename as it arrived from the email attachment.
formato
str
required
File extension without dot: pdf, jpg, jpeg, or png.
hash_sha256
str | None
SHA-256 hex digest of the original file bytes. Used for deduplication.
tamano_bytes
int | None
Size of the original file in bytes.
tamano_original_bytes
int | None
Alias for tamano_bytes; explicitly stored to retain the original size after compression.
tamano_almacenado_bytes
int | None
Actual size written to disk (compressed file size if compression succeeded).
ratio_compresion
float | None
Ratio of stored size to original size (e.g., 0.62 means 38% size reduction). null if no compression was applied.
comprimido
bool
default:"false"
Whether the stored file is a compressed copy of the original.
metodo_compresion
str | None
Compression method used: ghostscript for PDFs, pillow for images, null if uncompressed.

OcrInfo sub-model

procesado
bool
default:"false"
Whether OCR has been run on this file.
motor
str | None
OCR engine identifier. Always "doctr" for pipeline-processed files.
version_motor
str | None
OCR engine version string (e.g., "0.9.0").
confianza_promedio
float | None
Average word-level confidence from docTR (0.0–1.0).
texto_completo
str | None
Full concatenated text extracted from all pages.
idioma_detectado
str | None
Language code detected by the OCR engine.
paginas
int | None
Number of pages (PDF) or images processed.
campos_extraidos
dict[str, Any]
default:"{}"
Structured fields extracted by the LLM classifier (e.g., cedula_titular, nombre_titular, fecha_emision).

ValidacionDocumento sub-model

estado
string
default:"pendiente"
Human review status. One of: pendiente, aprobado, rechazado, requiere_revision.

All 22 TipoDocumento values

cedula_identidad          rif
partida_nacimiento        titulo_bachiller
titulo_universitario      titulo_postgrado
certificado_notas_bachillerato
certificado_notas_pregrado
certificado_notas_postgrado
acta_grado                fondo_negro_titulo
nostrificacion            resolucion_nombramiento
evaluacion_docente        diploma_curso
diploma_taller            diploma_congreso
constancia_trabajo        constancia_estudio
carta_recomendacion       curriculo_vitae
otro
The LLM is instructed to return one of these exact string values. If the document does not match any recognised type, it should return "otro" with valido=false rather than inventing a new type.

UsuarioModel

Source: src/models/usuario.py · Collection: usuarios UsuarioModel controls access to the FastAPI REST API. Authentication is handled with JWT tokens; the model stores the bcrypt-hashed password rather than plaintext credentials.
email
EmailStr
required
Validated email address. Unique identifier for the user account.
password_hash
str
required
Bcrypt hash of the user’s password. Never returned by any API endpoint.
nombre_completo
str
required
Full display name of the user.
rol
string
default:"usuario"
Access level. One of: admin, usuario, sistema. Only admin users can create new accounts or modify agent configuration.
activo
bool
default:"true"
Whether the account is enabled. Set to false to deactivate without deleting the record.
ultimo_login
datetime | None
Timestamp of the most recent successful login. Updated on every POST /auth/login call.
creado_en
datetime
Account creation timestamp.
actualizado_en
datetime
Timestamp of the most recent account update.
The sistema role is intended for service-to-service calls (e.g., a scheduled job that triggers the pipeline). It has the same API access as usuario but is distinguishable in audit logs.

Build docs developers (and LLMs) love