Fini Marketing Intelligence: System Architecture Overview

Fini Marketing Intelligence is a Python-based analytics platform built for a candy brand. Raw sales data is synthetically generated, validated, and loaded into a PostgreSQL data warehouse that follows a star schema design. From there, two analytical SQL views power an insights engine and three forecasting models. Results surface as CSV forecasts, JSON metric files, and auto-generated Markdown reports — all visualised in Power BI.

Pipeline Layers

The platform runs as a single orchestrated pipeline via run_pipeline.py, progressing through five distinct stages.

Data Generation

Three Python scripts under etl/ produce synthetic, reproducible datasets written to data/raw/:

Script	Output	Records
`generate_products.py`	`products.csv`	20 products
`generate_customers.py`	`customers.csv`	5,000 customers
`generate_sales.py`	`sales.csv`	100,000 transactions

Sales span 2023-01-01 to 2025-12-31 and encode seasonality rules for Halloween, Christmas, and Summer products. Customer sampling is weighted by purchase_frequency so high-frequency buyers appear proportionally more often.

ETL — Load to PostgreSQL

etl/load_to_postgres.py connects to PostgreSQL via SQLAlchemy (postgresql+psycopg2), clears the warehouse with a TRUNCATE … CASCADE, and streams each CSV into its target table using pandas.DataFrame.to_sql. Connection parameters are resolved from environment variables through etl/config.py.

PostgreSQL Data Warehouse

The warehouse holds two dimension tables (dim_products, dim_customers) and one central fact table (fact_sales) arranged in a star schema. Two analytical views — vw_sales_enriched and vw_product_launches — denormalise the schema for downstream queries without duplicating raw data.

Analytics & Insights

insights/generate_insights.py reads directly from vw_sales_enriched and vw_product_launches, computes four business insights (customer value, promotion effectiveness, launch velocity, regional performance), and writes reports/strategic_insights.md. insights/generate_rfm.py produces an RFM segmentation layer on top of the same data.

Forecasting Models

Three progressively richer forecasting scripts run in sequence:

Module	Approach	Output
`models/forecasting_baseline.py`	Statistical baseline	`outputs/forecast_baseline.csv`, `outputs/metrics_baseline.json`
`models/forecasting_enriched.py`	Feature-enriched regression	`outputs/forecast_enriched.csv`, `outputs/metrics_enriched.json`
`models/forecasting_xgboost.py`	XGBoost gradient boosting	`outputs/forecast_xgboost.csv`, `outputs/metrics_xgboost.json`

Each model writes a CSV forecast alongside a JSON metrics file so results can be compared directly.

Star Schema

The warehouse is organised around a single fact table joined to two dimension tables.

          dim_products
               │
               │ product_id
               │
fact_sales ────┤
               │
               │ customer_id
               │
          dim_customers

fact_sales is the central fact table. Every row represents one transaction and carries the measurable metrics — units, discount, revenue, cost, and margin.
dim_products describes the 20 Fini candy products: name, category, season, launch date, and unit economics.
dim_customers profiles the 5,000 customer records: age group, region, preferred channel, purchase frequency, and average ticket.

SQL Views

Two views sit on top of the star schema to provide analysis-ready flat tables.

View	Purpose
`vw_sales_enriched`	Joins `fact_sales` with both dimension tables. Returns every sale alongside full product and customer context. Used by the insights engine and RFM model.
`vw_product_launches`	Joins `fact_sales` with `dim_products`, filters to rows where `sale_date >= launch_date`, and computes `days_since_launch`. Used to measure post-launch revenue velocity.

Outputs

Directory	Contents
`outputs/`	`forecast_.csv` (daily revenue predictions) and `metrics_.json` (MAE, RMSE, MAPE for each model)
`reports/`	`strategic_insights.md` (narrative business insights) and timestamped `execution_report_*.md` files
`logs/`	Timestamped `pipeline_*.log` files for each run

Infrastructure

The PostgreSQL warehouse runs inside Docker Compose. The service uses the official postgres:16 image and the container is named fini_postgres. All credentials and the target port are injected via environment variables at runtime — no secrets are hard-coded.

services:
  postgres:
    image: postgres:16
    container_name: fini_postgres
    environment:
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_DB: ${POSTGRES_DB}
      POSTGRES_PORT: ${POSTGRES_PORT}
    ports:
      - "${POSTGRES_PORT}:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data

volumes:
  postgres_data:

Power BI Integration

power_bi/fini_BI.pbix connects directly to the PostgreSQL warehouse. Reports built in Power BI query both the raw dimension/fact tables and the two analytical views, enabling interactive slicing by product category, season, region, and time period.

Explore the Architecture

Data Model

Full DDL reference for dim_products, dim_customers, and fact_sales, including foreign keys and domain values.

SQL Views

Annotated CREATE VIEW statements for vw_sales_enriched and vw_product_launches, with example queries.

ETL Loading

How load_to_postgres.py clears and reloads the warehouse using SQLAlchemy and the TRUNCATE CASCADE pattern.

All synthetic datasets are generated with random.seed(42), making every pipeline run fully reproducible. The same 100,000 sales transactions, product catalogue, and customer base are produced regardless of the machine or environment.

Get Started

Architecture

ETL Pipeline

Analytics & Insights

Forecasting Models

Fini Marketing Intelligence: System Architecture Overview

Pipeline Layers

Star Schema

SQL Views

Outputs

Infrastructure

Power BI Integration

Explore the Architecture

Data Model

SQL Views

ETL Loading

Build docs developers (and LLMs) love

Get Started

Architecture

ETL Pipeline

Analytics & Insights

Forecasting Models

Documentation Index

​Pipeline Layers

​Star Schema

​SQL Views

​Outputs

​Infrastructure

​Power BI Integration

​Explore the Architecture

Data Model

SQL Views

ETL Loading

Build docs developers (and LLMs) love

Pipeline Layers

Star Schema

SQL Views

Outputs

Infrastructure

Power BI Integration

Explore the Architecture