Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/andresshm/fini-marketing-intelligence/llms.txt

Use this file to discover all available pages before exploring further.

This guide walks you through everything needed to run the Fini Marketing Intelligence pipeline from scratch on your local machine. By the end, you will have a live PostgreSQL database loaded with 20 products, 5,000 customers, and 100,000 synthetic sales records, along with RFM customer segments, revenue insights, and three independent 90-day sales forecasts saved to the outputs/ directory — all generated in a single command.
1

Check Prerequisites

Before you begin, make sure the following tools are installed and available on your PATH:
  • Python 3.9 or higher — the pipeline and all analytics modules require Python 3.9+
  • Docker and Docker Compose — used to run the PostgreSQL 16 database container
  • Git — to clone the repository
Verify your versions:
python --version   # Python 3.9+
docker --version   # Docker 20+
git --version
Then clone the repository:
git clone https://github.com/andresshm/fini-marketing-intelligence.git
cd fini-marketing-intelligence
2

Configure Environment Variables

The pipeline reads all database credentials from a .env file in the project root. Create the file now:
touch .env
Add the following variables, replacing your_password with a password of your choice:
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=fini_db
POSTGRES_USER=fini_user
POSTGRES_PASSWORD=your_password
These values are consumed both by Docker Compose (to configure the container) and by etl/config.py (to connect SQLAlchemy and psycopg2 at runtime):
# etl/config.py
DB_CONFIG = {
    "host": os.getenv("POSTGRES_HOST", "localhost"),
    "port": os.getenv("POSTGRES_PORT"),
    "database": os.getenv("POSTGRES_DB"),
    "user": os.getenv("POSTGRES_USER"),
    "password": os.getenv("POSTGRES_PASSWORD"),
}
Never commit your .env file to version control. The repository’s .gitignore should already exclude it.
3

Start PostgreSQL with Docker Compose

The docker-compose.yml file defines a single postgres:16 service named fini_postgres. Start it in detached mode:
docker-compose up -d
Confirm the container is running:
docker ps --filter name=fini_postgres
You should see fini_postgres listed with status Up. The database data is persisted in a named Docker volume (postgres_data) so it survives container restarts.
4

Create the Database Schema

Apply the star-schema DDL to create the three core tables — dim_products, dim_customers, and fact_sales:
docker exec -i fini_postgres psql \
  -U $POSTGRES_USER \
  -d $POSTGRES_DB \
  < sql/create_schema.sql
The schema that will be applied:
CREATE TABLE IF NOT EXISTS dim_products (
    product_id   INTEGER PRIMARY KEY,
    product_name VARCHAR(100) NOT NULL,
    category     VARCHAR(50)  NOT NULL,
    season       VARCHAR(50),
    launch_date  DATE,
    unit_cost    NUMERIC(10,2),
    unit_price   NUMERIC(10,2)
);

CREATE TABLE IF NOT EXISTS dim_customers (
    customer_id        INTEGER PRIMARY KEY,
    age_group          VARCHAR(20),
    region             VARCHAR(50),
    preferred_channel  VARCHAR(50),
    purchase_frequency VARCHAR(20),
    avg_ticket         NUMERIC(10,2)
);

CREATE TABLE IF NOT EXISTS fact_sales (
    sale_id     BIGINT  PRIMARY KEY,
    sale_date   DATE    NOT NULL,
    customer_id INTEGER NOT NULL,
    product_id  INTEGER NOT NULL,
    units       INTEGER NOT NULL,
    discount    NUMERIC(4,2)  NOT NULL,
    revenue     NUMERIC(10,2) NOT NULL,
    cost        NUMERIC(10,2) NOT NULL,
    margin      NUMERIC(10,2) NOT NULL,
    FOREIGN KEY (customer_id) REFERENCES dim_customers(customer_id),
    FOREIGN KEY (product_id)  REFERENCES dim_products(product_id)
);
5

Install Python Dependencies

Create and activate a virtual environment, then install all required packages:
python -m venv .venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate
pip install -r requirements.txt
Key packages installed include pandas, SQLAlchemy, psycopg2-binary, scikit-learn, numpy, and python-dotenv.
6

Run the Full Pipeline

Execute the orchestrator script to run all nine pipeline steps in sequence:
python run_pipeline.py
The pipeline runs the following steps in order:
#StepDescription
1Generar productosGenerates data/raw/products.csv — 20 products across 7 categories
2Generar clientesGenerates data/raw/customers.csv — 5,000 customers with demographics and channel data
3Generar ventasGenerates data/raw/sales.csv — 100,000 transactions with seasonality simulation
4Cargar PostgreSQLLoads all three CSVs into the PostgreSQL star schema
5Generar insightsComputes product-level revenue and margin analytics
6Generar RFMRuns RFM scoring and assigns customer segments
7Generar Forecasting BaseFits a baseline Prophet model and saves forecast + metrics
8Generar Forecasting EnriquecidoFits an enriched Prophet model with seasonality regressors
9Generar Forecasting XGBoostFits an XGBoost model on lag and calendar features
After all steps complete, the pipeline validates the raw CSVs against expected row counts (20 products, 5,000 customers, 100,000 sales) and writes a timestamped execution report to reports/.
All data generators use a fixed random seed of 42. This means every run of python run_pipeline.py produces byte-for-byte identical CSVs, database contents, and model outputs — making the entire pipeline deterministic and reproducible.
7

Explore the Outputs

Once the pipeline finishes successfully, you will find the following output files ready to use:Forecast CSVs and metrics — outputs/
outputs/
├── forecast_baseline.csv      # Prophet baseline 90-day forecast
├── forecast_enriched.csv      # Prophet enriched 90-day forecast
├── forecast_xgboost.csv       # XGBoost 90-day forecast
├── metrics_baseline.json      # MAE, RMSE, MAPE for baseline model
├── metrics_enriched.json      # MAE, RMSE, MAPE for enriched model
└── metrics_xgboost.json       # MAE, RMSE, MAPE for XGBoost model
Execution reports — reports/
reports/
└── execution_report_<YYYYMMDD_HHMMSS>.md   # Row counts + pipeline status
Pipeline logs — logs/
logs/
└── pipeline_<YYYYMMDD_HHMMSS>.log   # Step-by-step stdout and error output
The repository includes a pre-built Power BI dashboard at power_bi/fini_BI.pbix. Open it in Power BI Desktop and point the data source at your local PostgreSQL instance to explore interactive sales, segmentation, and forecast visuals without writing any additional queries.

Build docs developers (and LLMs) love