Get Started with Fini Marketing Intelligence Platform

This guide walks you through everything needed to run the Fini Marketing Intelligence pipeline from scratch on your local machine. By the end, you will have a live PostgreSQL database loaded with 20 products, 5,000 customers, and 100,000 synthetic sales records, along with RFM customer segments, revenue insights, and three independent 90-day sales forecasts saved to the outputs/ directory — all generated in a single command.

Check Prerequisites

Before you begin, make sure the following tools are installed and available on your PATH:

Python 3.9 or higher — the pipeline and all analytics modules require Python 3.9+
Docker and Docker Compose — used to run the PostgreSQL 16 database container
Git — to clone the repository

Verify your versions:

python --version   # Python 3.9+
docker --version   # Docker 20+
git --version

Then clone the repository:

git clone https://github.com/andresshm/fini-marketing-intelligence.git
cd fini-marketing-intelligence

Configure Environment Variables

The pipeline reads all database credentials from a .env file in the project root. Create the file now:

touch .env

Add the following variables, replacing your_password with a password of your choice:

POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=fini_db
POSTGRES_USER=fini_user
POSTGRES_PASSWORD=your_password

These values are consumed both by Docker Compose (to configure the container) and by etl/config.py (to connect SQLAlchemy and psycopg2 at runtime):

# etl/config.py
DB_CONFIG = {
    "host": os.getenv("POSTGRES_HOST", "localhost"),
    "port": os.getenv("POSTGRES_PORT"),
    "database": os.getenv("POSTGRES_DB"),
    "user": os.getenv("POSTGRES_USER"),
    "password": os.getenv("POSTGRES_PASSWORD"),
}

Never commit your .env file to version control. The repository’s .gitignore should already exclude it.

Start PostgreSQL with Docker Compose

The docker-compose.yml file defines a single postgres:16 service named fini_postgres. Start it in detached mode:

docker-compose up -d

Confirm the container is running:

docker ps --filter name=fini_postgres

You should see fini_postgres listed with status Up. The database data is persisted in a named Docker volume (postgres_data) so it survives container restarts.

Create the Database Schema

Apply the star-schema DDL to create the three core tables — dim_products, dim_customers, and fact_sales:

docker exec -i fini_postgres psql \
  -U $POSTGRES_USER \
  -d $POSTGRES_DB \
  < sql/create_schema.sql

The schema that will be applied:

CREATE TABLE IF NOT EXISTS dim_products (
    product_id   INTEGER PRIMARY KEY,
    product_name VARCHAR(100) NOT NULL,
    category     VARCHAR(50)  NOT NULL,
    season       VARCHAR(50),
    launch_date  DATE,
    unit_cost    NUMERIC(10,2),
    unit_price   NUMERIC(10,2)
);

CREATE TABLE IF NOT EXISTS dim_customers (
    customer_id        INTEGER PRIMARY KEY,
    age_group          VARCHAR(20),
    region             VARCHAR(50),
    preferred_channel  VARCHAR(50),
    purchase_frequency VARCHAR(20),
    avg_ticket         NUMERIC(10,2)
);

CREATE TABLE IF NOT EXISTS fact_sales (
    sale_id     BIGINT  PRIMARY KEY,
    sale_date   DATE    NOT NULL,
    customer_id INTEGER NOT NULL,
    product_id  INTEGER NOT NULL,
    units       INTEGER NOT NULL,
    discount    NUMERIC(4,2)  NOT NULL,
    revenue     NUMERIC(10,2) NOT NULL,
    cost        NUMERIC(10,2) NOT NULL,
    margin      NUMERIC(10,2) NOT NULL,
    FOREIGN KEY (customer_id) REFERENCES dim_customers(customer_id),
    FOREIGN KEY (product_id)  REFERENCES dim_products(product_id)
);

Install Python Dependencies

Create and activate a virtual environment, then install all required packages:

python -m venv .venv
source .venv/bin/activate      # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Key packages installed include pandas, SQLAlchemy, psycopg2-binary, scikit-learn, numpy, and python-dotenv.

Run the Full Pipeline

Execute the orchestrator script to run all nine pipeline steps in sequence:

python run_pipeline.py

The pipeline runs the following steps in order:

#	Step	Description
1	Generar productos	Generates `data/raw/products.csv` — 20 products across 7 categories
2	Generar clientes	Generates `data/raw/customers.csv` — 5,000 customers with demographics and channel data
3	Generar ventas	Generates `data/raw/sales.csv` — 100,000 transactions with seasonality simulation
4	Cargar PostgreSQL	Loads all three CSVs into the PostgreSQL star schema
5	Generar insights	Computes product-level revenue and margin analytics
6	Generar RFM	Runs RFM scoring and assigns customer segments
7	Generar Forecasting Base	Fits a baseline Prophet model and saves forecast + metrics
8	Generar Forecasting Enriquecido	Fits an enriched Prophet model with seasonality regressors
9	Generar Forecasting XGBoost	Fits an XGBoost model on lag and calendar features

After all steps complete, the pipeline validates the raw CSVs against expected row counts (20 products, 5,000 customers, 100,000 sales) and writes a timestamped execution report to reports/.

All data generators use a fixed random seed of 42. This means every run of python run_pipeline.py produces byte-for-byte identical CSVs, database contents, and model outputs — making the entire pipeline deterministic and reproducible.

Explore the Outputs

Once the pipeline finishes successfully, you will find the following output files ready to use:Forecast CSVs and metrics — outputs/

outputs/
├── forecast_baseline.csv      # Prophet baseline 90-day forecast
├── forecast_enriched.csv      # Prophet enriched 90-day forecast
├── forecast_xgboost.csv       # XGBoost 90-day forecast
├── metrics_baseline.json      # MAE, RMSE, MAPE for baseline model
├── metrics_enriched.json      # MAE, RMSE, MAPE for enriched model
└── metrics_xgboost.json       # MAE, RMSE, MAPE for XGBoost model

Execution reports — reports/

reports/
└── execution_report_<YYYYMMDD_HHMMSS>.md   # Row counts + pipeline status

Pipeline logs — logs/

logs/
└── pipeline_<YYYYMMDD_HHMMSS>.log   # Step-by-step stdout and error output

The repository includes a pre-built Power BI dashboard at power_bi/fini_BI.pbix. Open it in Power BI Desktop and point the data source at your local PostgreSQL instance to explore interactive sales, segmentation, and forecast visuals without writing any additional queries.

Get Started

Architecture

ETL Pipeline

Analytics & Insights

Forecasting Models

Get Started with Fini Marketing Intelligence Platform

Build docs developers (and LLMs) love

Get Started

Architecture

ETL Pipeline

Analytics & Insights

Forecasting Models

Documentation Index

Build docs developers (and LLMs) love