Fini Marketing Intelligence is a Python-based analytics platform built for a candy brand. Raw sales data is synthetically generated, validated, and loaded into a PostgreSQL data warehouse that follows a star schema design. From there, two analytical SQL views power an insights engine and three forecasting models. Results surface as CSV forecasts, JSON metric files, and auto-generated Markdown reports — all visualised in Power BI.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/andresshm/fini-marketing-intelligence/llms.txt
Use this file to discover all available pages before exploring further.
Pipeline Layers
The platform runs as a single orchestrated pipeline viarun_pipeline.py, progressing through five distinct stages.
Data Generation
Three Python scripts under
Sales span 2023-01-01 to 2025-12-31 and encode seasonality rules for Halloween, Christmas, and Summer products. Customer sampling is weighted by
etl/ produce synthetic, reproducible datasets written to data/raw/:| Script | Output | Records |
|---|---|---|
generate_products.py | products.csv | 20 products |
generate_customers.py | customers.csv | 5,000 customers |
generate_sales.py | sales.csv | 100,000 transactions |
purchase_frequency so high-frequency buyers appear proportionally more often.ETL — Load to PostgreSQL
etl/load_to_postgres.py connects to PostgreSQL via SQLAlchemy (postgresql+psycopg2), clears the warehouse with a TRUNCATE … CASCADE, and streams each CSV into its target table using pandas.DataFrame.to_sql. Connection parameters are resolved from environment variables through etl/config.py.PostgreSQL Data Warehouse
The warehouse holds two dimension tables (
dim_products, dim_customers) and one central fact table (fact_sales) arranged in a star schema. Two analytical views — vw_sales_enriched and vw_product_launches — denormalise the schema for downstream queries without duplicating raw data.Analytics & Insights
insights/generate_insights.py reads directly from vw_sales_enriched and vw_product_launches, computes four business insights (customer value, promotion effectiveness, launch velocity, regional performance), and writes reports/strategic_insights.md. insights/generate_rfm.py produces an RFM segmentation layer on top of the same data.Forecasting Models
Three progressively richer forecasting scripts run in sequence:
Each model writes a CSV forecast alongside a JSON metrics file so results can be compared directly.
| Module | Approach | Output |
|---|---|---|
models/forecasting_baseline.py | Statistical baseline | outputs/forecast_baseline.csv, outputs/metrics_baseline.json |
models/forecasting_enriched.py | Feature-enriched regression | outputs/forecast_enriched.csv, outputs/metrics_enriched.json |
models/forecasting_xgboost.py | XGBoost gradient boosting | outputs/forecast_xgboost.csv, outputs/metrics_xgboost.json |
Star Schema
The warehouse is organised around a single fact table joined to two dimension tables.fact_salesis the central fact table. Every row represents one transaction and carries the measurable metrics —units,discount,revenue,cost, andmargin.dim_productsdescribes the 20 Fini candy products: name, category, season, launch date, and unit economics.dim_customersprofiles the 5,000 customer records: age group, region, preferred channel, purchase frequency, and average ticket.
SQL Views
Two views sit on top of the star schema to provide analysis-ready flat tables.| View | Purpose |
|---|---|
vw_sales_enriched | Joins fact_sales with both dimension tables. Returns every sale alongside full product and customer context. Used by the insights engine and RFM model. |
vw_product_launches | Joins fact_sales with dim_products, filters to rows where sale_date >= launch_date, and computes days_since_launch. Used to measure post-launch revenue velocity. |
Outputs
| Directory | Contents |
|---|---|
outputs/ | forecast_*.csv (daily revenue predictions) and metrics_*.json (MAE, RMSE, MAPE for each model) |
reports/ | strategic_insights.md (narrative business insights) and timestamped execution_report_*.md files |
logs/ | Timestamped pipeline_*.log files for each run |
Infrastructure
The PostgreSQL warehouse runs inside Docker Compose. The service uses the officialpostgres:16 image and the container is named fini_postgres. All credentials and the target port are injected via environment variables at runtime — no secrets are hard-coded.
Power BI Integration
power_bi/fini_BI.pbix connects directly to the PostgreSQL warehouse. Reports built in Power BI query both the raw dimension/fact tables and the two analytical views, enabling interactive slicing by product category, season, region, and time period.
Explore the Architecture
Data Model
Full DDL reference for
dim_products, dim_customers, and fact_sales, including foreign keys and domain values.SQL Views
Annotated
CREATE VIEW statements for vw_sales_enriched and vw_product_launches, with example queries.ETL Loading
How
load_to_postgres.py clears and reloads the warehouse using SQLAlchemy and the TRUNCATE CASCADE pattern.All synthetic datasets are generated with
random.seed(42), making every pipeline run fully reproducible. The same 100,000 sales transactions, product catalogue, and customer base are produced regardless of the machine or environment.