This guide walks you through everything needed to run the Fini Marketing Intelligence pipeline from scratch on your local machine. By the end, you will have a live PostgreSQL database loaded with 20 products, 5,000 customers, and 100,000 synthetic sales records, along with RFM customer segments, revenue insights, and three independent 90-day sales forecasts saved to theDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/andresshm/fini-marketing-intelligence/llms.txt
Use this file to discover all available pages before exploring further.
outputs/ directory — all generated in a single command.
Check Prerequisites
Before you begin, make sure the following tools are installed and available on your Then clone the repository:
PATH:- Python 3.9 or higher — the pipeline and all analytics modules require Python 3.9+
- Docker and Docker Compose — used to run the PostgreSQL 16 database container
- Git — to clone the repository
Configure Environment Variables
The pipeline reads all database credentials from a Add the following variables, replacing These values are consumed both by Docker Compose (to configure the container) and by
.env file in the project root. Create the file now:your_password with a password of your choice:etl/config.py (to connect SQLAlchemy and psycopg2 at runtime):Never commit your
.env file to version control. The repository’s .gitignore should already exclude it.Start PostgreSQL with Docker Compose
The Confirm the container is running:You should see
docker-compose.yml file defines a single postgres:16 service named fini_postgres. Start it in detached mode:fini_postgres listed with status Up. The database data is persisted in a named Docker volume (postgres_data) so it survives container restarts.Create the Database Schema
Apply the star-schema DDL to create the three core tables — The schema that will be applied:
dim_products, dim_customers, and fact_sales:Install Python Dependencies
Create and activate a virtual environment, then install all required packages:Key packages installed include
pandas, SQLAlchemy, psycopg2-binary, scikit-learn, numpy, and python-dotenv.Run the Full Pipeline
Execute the orchestrator script to run all nine pipeline steps in sequence:The pipeline runs the following steps in order:
After all steps complete, the pipeline validates the raw CSVs against expected row counts (20 products, 5,000 customers, 100,000 sales) and writes a timestamped execution report to
| # | Step | Description |
|---|---|---|
| 1 | Generar productos | Generates data/raw/products.csv — 20 products across 7 categories |
| 2 | Generar clientes | Generates data/raw/customers.csv — 5,000 customers with demographics and channel data |
| 3 | Generar ventas | Generates data/raw/sales.csv — 100,000 transactions with seasonality simulation |
| 4 | Cargar PostgreSQL | Loads all three CSVs into the PostgreSQL star schema |
| 5 | Generar insights | Computes product-level revenue and margin analytics |
| 6 | Generar RFM | Runs RFM scoring and assigns customer segments |
| 7 | Generar Forecasting Base | Fits a baseline Prophet model and saves forecast + metrics |
| 8 | Generar Forecasting Enriquecido | Fits an enriched Prophet model with seasonality regressors |
| 9 | Generar Forecasting XGBoost | Fits an XGBoost model on lag and calendar features |
reports/.All data generators use a fixed random seed of 42. This means every run of
python run_pipeline.py produces byte-for-byte identical CSVs, database contents, and model outputs — making the entire pipeline deterministic and reproducible.