The CoffePrice model is retrained from scratch on every pipeline run — there is no incremental update. This ensures the ensemble weights, strategy selection, and calibration factor always reflect the most recent historical data. Retraining takes roughly 30–90 seconds on a standard laptop depending on dataset size.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/JaiderT/CoffeePrice/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
- Python 3.9+ (the GitHub Actions workflow uses 3.11)
- pip package manager
- Raw or pre-fetched market data files in
ml-service-experimental/datos/
Installation and Full Pipeline Run
Navigate to the ML service directory
All scripts must be run from inside
ml-service-experimental/ because they use Path(__file__).resolve().parent to locate data and model directories.Install Python dependencies
Two requirements files are provided. Core dependencies:
requirements.txt and requirements_hibrido.txt are currently identical — install both to be safe:Prepare raw data files
Place your historical market data CSVs inside
datos/. At minimum you need:precios_fnc_historicos.csv— columns:ds(date),y(FNC price in COP)- KC data: either
precios_limpios.csv(already cleaned) or rawPrecios cafe.csv/Precios_cafe.csv - TRM data: either
trm_limpias.csv(already cleaned) or rawTasa de cambio TRM.csv/Tasa_de_cambio_TRM.csv
Clean market data
Consolidates raw KC and TRM files, applies range filters, normalises decimals, and deduplicates:Outputs:
datos/precios_limpios.csv, datos/trm_limpias.csvBuild external variables
Merges optional external feeds (USD/BRL, Brazil climate, ICE inventories) into a single CSV aligned to the FNC date range:Output:
datos/variables_externas.csvWhat actualizar_todo.py Does
The orchestrator runs these steps sequentially. Critical steps abort the pipeline on failure; non-critical ones log a warning and continue.
| Step | Script | Critical? | Description |
|---|---|---|---|
| 1 | obtener_kc_automatico.py | ✅ | Fetch latest KC=F price from Yahoo Finance |
| 2 | obtener_trm_automatico.py | ✅ | Fetch latest COP/USD TRM from Frankfurter / open.er-api |
| 3 | obtener_fnc_automatico.py | ⚠️ | Scrape today’s FNC price from the FNC website |
| 4 | obtener_usd_brl.py | ⚠️ | Fetch USD/BRL rate |
| 5 | obtener_clima_brasil.py | ⚠️ | Fetch Brazil weather alerts |
| 6 | obtener_inventarios_ice.py | ⚠️ | Fetch ICE inventory levels |
| 7 | limpiar_datos.py | ✅ | Clean and deduplicate KC and TRM data |
| 8 | variables_externas.py | ✅ | Build variables_externas.csv |
| 9 | entrenar_fnc_hibrido.py | ✅ | Train Prophet + XGBoost, compute ensemble weights, save artefacts |
| 10 | predecir_fnc_hibrido.py | ✅ | Generate next-day prediction and write JSON |
| 11 | evaluar_predicciones_fnc.py | ⚠️ | Update evaluation CSV with any newly-available real FNC prices |
Training Details
Train/Test Split
The pipeline uses a temporal holdout — never random shuffling — to prevent data leakage:- Minimum records required: 45
- Minimum training rows after split: 30
- Minimum holdout rows: 10
Prophet Configuration
kc_centavos, trm
XGBoost Configuration
Two XGBoost models are trained — one corrects the Prophet residual, one corrects the formula residual:feature_columns(), which includes all lag features, moving averages, formula columns, calendar columns, external variables, and prophet_yhat.
Strategy Selection Logic
After evaluating holdout MAPE for all four strategies, the winning strategy is chosen bychoose_primary_strategy():
- Sort strategies ascending by holdout MAPE.
- If the best strategy is not
naive, return it. - If the best strategy is
naive, check whether the second-best is within 20% relative error. If so, fall back toensemble(to avoid over-relying on carry-forward). - Otherwise,
naivewins.
ensemble is selected, a final comparison is run between prophet, hybrid, formula, and ensemble to pick the single best performer.
Output Files After Training
| File | Description |
|---|---|
modelos/modelo_prophet_hibrido.pkl | Serialised Prophet model (final, fit on all data) |
modelos/modelo_xgboost.pkl | Serialised XGBoost Prophet-residual corrector |
modelos/modelo_formula_xgboost.pkl | Serialised XGBoost formula-residual corrector |
modelos/features_hibrido.pkl | Feature config dict: feature_cols, best_strategy, ensemble_weights, recent_change_limit |
modelos/metricas_fnc_hibrido.json | Full metrics report (see below) |
backend/datos/predicciones_fnc.json | Latest prediction payload for the API |
datos/historial_predicciones_fnc.csv | Appended prediction history row |
Interpreting metricas_fnc_hibrido.json
| Field | Description |
|---|---|
estrategia_seleccionada | The primary strategy chosen for prediction (naive, prophet, hybrid, or formula). Influences explanation text and blending logic at prediction time. |
estado_modelo | "usable" if the winning strategy’s holdout MAPE ≤ 1.0%; otherwise "seguir_en_pruebas" (keep testing). |
registros_base | Total rows in the merged daily base dataframe before creating the supervised frame. |
registros_supervisados | Rows in the final training frame after dropping NaNs from feature construction. |
rango_entrenamiento | ISO date range covered by the training data. |
max_cambio_diario_permitido | Safety clamp: maximum allowed daily price change as a fraction (e.g., 0.02853 = 2.853%). Derived from the 90th percentile of recent 30-day daily changes × 1.15. |
ensemble_weights | Inverse-MAPE weights used at prediction time (with naive penalty applied). |
metricas.train | In-sample MAPE and MAE for each strategy. Low values here do not guarantee good holdout performance. |
metricas.holdout | Out-of-sample MAPE and MAE — the primary quality signal. The winning strategy is selected from these values. |