Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/andresshm/fini-marketing-intelligence/llms.txt

Use this file to discover all available pages before exploring further.

Fini Marketing Intelligence is a Python-based analytics platform built for a candy brand. Raw sales data is synthetically generated, validated, and loaded into a PostgreSQL data warehouse that follows a star schema design. From there, two analytical SQL views power an insights engine and three forecasting models. Results surface as CSV forecasts, JSON metric files, and auto-generated Markdown reports — all visualised in Power BI.

Pipeline Layers

The platform runs as a single orchestrated pipeline via run_pipeline.py, progressing through five distinct stages.
1

Data Generation

Three Python scripts under etl/ produce synthetic, reproducible datasets written to data/raw/:
ScriptOutputRecords
generate_products.pyproducts.csv20 products
generate_customers.pycustomers.csv5,000 customers
generate_sales.pysales.csv100,000 transactions
Sales span 2023-01-01 to 2025-12-31 and encode seasonality rules for Halloween, Christmas, and Summer products. Customer sampling is weighted by purchase_frequency so high-frequency buyers appear proportionally more often.
2

ETL — Load to PostgreSQL

etl/load_to_postgres.py connects to PostgreSQL via SQLAlchemy (postgresql+psycopg2), clears the warehouse with a TRUNCATE … CASCADE, and streams each CSV into its target table using pandas.DataFrame.to_sql. Connection parameters are resolved from environment variables through etl/config.py.
3

PostgreSQL Data Warehouse

The warehouse holds two dimension tables (dim_products, dim_customers) and one central fact table (fact_sales) arranged in a star schema. Two analytical views — vw_sales_enriched and vw_product_launches — denormalise the schema for downstream queries without duplicating raw data.
4

Analytics & Insights

insights/generate_insights.py reads directly from vw_sales_enriched and vw_product_launches, computes four business insights (customer value, promotion effectiveness, launch velocity, regional performance), and writes reports/strategic_insights.md. insights/generate_rfm.py produces an RFM segmentation layer on top of the same data.
5

Forecasting Models

Three progressively richer forecasting scripts run in sequence:
ModuleApproachOutput
models/forecasting_baseline.pyStatistical baselineoutputs/forecast_baseline.csv, outputs/metrics_baseline.json
models/forecasting_enriched.pyFeature-enriched regressionoutputs/forecast_enriched.csv, outputs/metrics_enriched.json
models/forecasting_xgboost.pyXGBoost gradient boostingoutputs/forecast_xgboost.csv, outputs/metrics_xgboost.json
Each model writes a CSV forecast alongside a JSON metrics file so results can be compared directly.

Star Schema

The warehouse is organised around a single fact table joined to two dimension tables.
          dim_products

               │ product_id

fact_sales ────┤

               │ customer_id

          dim_customers
  • fact_sales is the central fact table. Every row represents one transaction and carries the measurable metrics — units, discount, revenue, cost, and margin.
  • dim_products describes the 20 Fini candy products: name, category, season, launch date, and unit economics.
  • dim_customers profiles the 5,000 customer records: age group, region, preferred channel, purchase frequency, and average ticket.

SQL Views

Two views sit on top of the star schema to provide analysis-ready flat tables.
ViewPurpose
vw_sales_enrichedJoins fact_sales with both dimension tables. Returns every sale alongside full product and customer context. Used by the insights engine and RFM model.
vw_product_launchesJoins fact_sales with dim_products, filters to rows where sale_date >= launch_date, and computes days_since_launch. Used to measure post-launch revenue velocity.

Outputs

DirectoryContents
outputs/forecast_*.csv (daily revenue predictions) and metrics_*.json (MAE, RMSE, MAPE for each model)
reports/strategic_insights.md (narrative business insights) and timestamped execution_report_*.md files
logs/Timestamped pipeline_*.log files for each run

Infrastructure

The PostgreSQL warehouse runs inside Docker Compose. The service uses the official postgres:16 image and the container is named fini_postgres. All credentials and the target port are injected via environment variables at runtime — no secrets are hard-coded.
services:
  postgres:
    image: postgres:16
    container_name: fini_postgres
    environment:
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_DB: ${POSTGRES_DB}
      POSTGRES_PORT: ${POSTGRES_PORT}
    ports:
      - "${POSTGRES_PORT}:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data

volumes:
  postgres_data:

Power BI Integration

power_bi/fini_BI.pbix connects directly to the PostgreSQL warehouse. Reports built in Power BI query both the raw dimension/fact tables and the two analytical views, enabling interactive slicing by product category, season, region, and time period.

Explore the Architecture

Data Model

Full DDL reference for dim_products, dim_customers, and fact_sales, including foreign keys and domain values.

SQL Views

Annotated CREATE VIEW statements for vw_sales_enriched and vw_product_launches, with example queries.

ETL Loading

How load_to_postgres.py clears and reloads the warehouse using SQLAlchemy and the TRUNCATE CASCADE pattern.
All synthetic datasets are generated with random.seed(42), making every pipeline run fully reproducible. The same 100,000 sales transactions, product catalogue, and customer base are produced regardless of the machine or environment.

Build docs developers (and LLMs) love