CoffePrice’s ML pipeline forecasts the next business day’s FNC (Federación Nacional de Cafeteros) internal price per carga — the benchmark price paid to Colombian coffee farmers. Every weekday afternoon the pipeline ingests the latest KC futures close, the COP/USD TRM exchange rate, and a set of external market variables, then blends four independent prediction strategies into a single ensemble price. The result is written to a JSON file that the backend API serves directly to the frontend.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/JaiderT/CoffeePrice/llms.txt
Use this file to discover all available pages before exploring further.
Model Architecture
The pipeline maintains four parallel prediction strategies. Each strategy generates its own price estimate; their outputs are then combined via inverse-MAPE weighting.Naive (Carry-Forward)
Predicts that tomorrow’s FNC price equals today’s. Surprisingly competitive on stable days. Ensemble weight: 0.4516.
Formula (KC × TRM)
Converts the NY Coffee C futures price to COP using the TRM and a rolling 14-day calibration factor, then adds an XGBoost residual correction. Ensemble weight: 0.3577.
Prophet (Time-Series)
Facebook Prophet model with yearly and weekly seasonality,
kc_centavos and trm as external regressors. Ensemble weight: 0.0987.Hybrid XGBoost
Prophet’s forecast plus an XGBoost model trained on the 45-feature residual. Captures non-linear interactions between lag prices, moving averages, and market signals. Ensemble weight: 0.092.
Ensemble Weights (from metricas_fnc_hibrido.json)
Weights are derived from inverse-MAPE on the holdout set, with a small naive penalty (naive_penalty=0.92) to avoid over-relying on carry-forward in dynamic markets:
Selected Strategy
The current selected strategy isnaive with a holdout MAPE of 0.8784%. Strategy selection is determined by choose_primary_strategy(): naive wins unless another strategy beats it by more than 20% relative; if the margin is within 20%, the ensemble is used instead.
Dynamic Weight Adjustment
At prediction time, the pipeline computes a market signal strength score (0–1) from the daily percentage changes in KC futures and TRM:| Strategy | Adjustment factor |
|---|---|
| naive | weight × (1 − 0.75 × signal) |
| formula | weight × (1 + 0.65 × signal) |
| hybrid | weight × (1 + 0.55 × signal) |
| prophet | weight × (1 + 0.35 × signal) |
Performance Metrics
All MAPE values are percentages. Data sourced frommetricas_fnc_hibrido.json (361 base records, trained 2025-05-30 → 2026-05-25).
Holdout Set (last ~20% of data, min 14 days)
| Strategy | MAPE (%) | MAE (COP) |
|---|---|---|
| naive | 0.88 | 19,714 |
| formula | 1.21 | 26,986 |
| ensemble | 1.52 | 33,784 |
| prophet | 4.37 | 96,647 |
| hybrid | 4.69 | 103,733 |
Training Set
| Strategy | MAPE (%) | MAE (COP) |
|---|---|---|
| hybrid | 0.40 | 10,354 |
| ensemble | 0.71 | 18,479 |
| formula | 0.78 | 19,944 |
| naive | 1.33 | — |
| prophet | 1.45 | — |
The large gap between hybrid’s training MAPE (0.40%) and holdout MAPE (4.69%) reflects that XGBoost over-fits on the training window. The naive strategy generalises best on the current holdout period, which is why it is selected as the primary strategy.
Output Files
| File | Description |
|---|---|
backend/datos/predicciones_fnc.json | Latest prediction payload served by the API |
ml-service-experimental/datos/historial_predicciones_fnc.csv | Full prediction history (one row per run) |
ml-service-experimental/datos/evaluacion_predicciones_fnc.csv | Evaluation against real FNC prices once they are published |
ml-service-experimental/modelos/metricas_fnc_hibrido.json | Latest training metrics and ensemble weights |
Pipeline Stages
Data Collection
Four fetch scripts pull the latest market data automatically:
obtener_fnc_automatico.pyscrapes the FNC website for today’s internal priceobtener_kc_automatico.pyfetches KC=F (NY Coffee C) closing price from Yahoo Financeobtener_trm_automatico.pyretrieves the COP/USD TRM from Frankfurter API or open.er-api.comobtener_usd_brl.py,obtener_clima_brasil.py, andobtener_inventarios_ice.pyfetch supporting external variables
Data Cleaning
limpiar_datos.py consolidates raw CSV files, applies range filters, normalises decimal separators, deduplicates by date, and writes precios_limpios.csv and trm_limpias.csv.Feature Engineering
pipeline_fnc_hibrido.py builds the daily base dataframe. It merges FNC, KC, TRM, and external variables; adds formula columns; adds calendar features; and constructs a supervised frame with lag features, moving averages, volatility, and return rates — 45 features total.Model Training
entrenar_fnc_hibrido.py trains Prophet (with KC and TRM regressors) and two XGBoost correctors (one on the Prophet residual, one on the formula residual), evaluates all strategies on a temporal holdout, computes ensemble weights, and persists model artefacts plus metricas_fnc_hibrido.json.Prediction Generation
predecir_fnc_hibrido.py loads the saved artefacts, reconstructs today’s feature row, runs all four strategies, applies dynamic weight adjustment based on live KC/TRM signals, applies a market-pressure clamp, rounds to the nearest COP 100, and writes predicciones_fnc.json.Conversion Constants
The formula strategy converts KC futures (quoted in US cents per pound) to Colombian pesos per carga using these constants defined inpipeline_fnc_hibrido.py: