Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/andresshm/fini-marketing-intelligence/llms.txt

Use this file to discover all available pages before exploring further.

The Fini data warehouse uses a classic star schema composed of two dimension tables and one fact table. dim_products and dim_customers hold the descriptive attributes that give context to each transaction, while fact_sales records every individual sale together with its measured metrics. This design keeps analytical queries simple — most reports need only a single join from the fact table to whichever dimension they need — and makes the schema easy to extend.

Full DDL

The schema is created by sql/create_schema.sql. All three tables are created with IF NOT EXISTS guards so the script is safe to re-run.
CREATE TABLE IF NOT EXISTS dim_products (
    product_id INTEGER PRIMARY KEY,
    product_name VARCHAR(100) NOT NULL,
    category VARCHAR(50) NOT NULL,
    season VARCHAR(50),
    launch_date DATE,
    unit_cost NUMERIC(10,2),
    unit_price NUMERIC(10,2)
);

CREATE TABLE IF NOT EXISTS dim_customers (
    customer_id INTEGER PRIMARY KEY,
    age_group VARCHAR(20),
    region VARCHAR(50),
    preferred_channel VARCHAR(50),
    purchase_frequency VARCHAR(20),
    avg_ticket NUMERIC(10,2)
);

CREATE TABLE IF NOT EXISTS fact_sales (
    sale_id BIGINT PRIMARY KEY,
    sale_date DATE NOT NULL,
    customer_id INTEGER NOT NULL,
    product_id INTEGER NOT NULL,
    units INTEGER NOT NULL,
    discount NUMERIC(4,2) NOT NULL,
    revenue NUMERIC(10,2) NOT NULL,
    cost NUMERIC(10,2) NOT NULL,
    margin NUMERIC(10,2) NOT NULL,

    FOREIGN KEY (customer_id)
        REFERENCES dim_customers(customer_id),

    FOREIGN KEY (product_id)
        REFERENCES dim_products(product_id)
);

dim_products

Holds the catalogue of 20 Fini candy products. Each row describes one SKU with its commercial positioning and unit economics.
product_id
INTEGER
required
Surrogate primary key. Unique identifier for each product in the catalogue.
product_name
VARCHAR(100)
required
Human-readable product name, e.g. "Fini Gummy Bears". Never null.
category
VARCHAR(50)
required
Product family. One of: Gummies, Belts, Seasonal, Marshmallow, Licorice, Foam, Novelty.
season
VARCHAR(50)
Seasonal positioning that drives the date-skewing logic in generate_sales.py. One of: All Year, Halloween, Christmas, Summer, Valentine. Nullable — products without a defined season default to year-round uniform distribution.
launch_date
DATE
The date on which the product became available for sale. Sales are never generated before this date. Used as the reference point for days_since_launch in vw_product_launches.
unit_cost
NUMERIC(10,2)
Cost of goods per unit. Used in generate_sales.py to compute cost = units × unit_cost.
unit_price
NUMERIC(10,2)
Retail price per unit before any discount. Revenue is calculated as units × unit_price × (1 − discount).

dim_customers

Holds 5,000 synthetic customer profiles. Customers are not individually named — they are represented by behavioural and demographic segments to protect analytical generalisability.
customer_id
INTEGER
required
Surrogate primary key. Referenced by every row in fact_sales.
age_group
VARCHAR(20)
Age bracket of the customer. One of: 18-24, 25-34, 35-44, 45-54, 55+.
region
VARCHAR(50)
Geographic sales region. One of: North, South, East, West, Center.
preferred_channel
VARCHAR(50)
The sales channel this customer most frequently uses. One of: Supermarket, E-commerce, Convenience Store, Hypermarket.
purchase_frequency
VARCHAR(20)
Behavioural purchase frequency segment. One of: Low, Medium, High. During sales generation, customers are sampled with weights Low=1, Medium=2, High=4, so high-frequency buyers appear proportionally more often in fact_sales.
avg_ticket
NUMERIC(10,2)
Average transaction value for this customer segment. Stored as a descriptive attribute; the actual per-transaction revenue is computed and stored in fact_sales.

fact_sales

The central fact table. Each row records one transaction: which customer bought which product, on what date, at what discount, and the resulting financial metrics. The table is designed to grow large — sale_id is a BIGINT — and is populated with 100,000 rows by default.
sale_id
BIGINT
required
Sequential surrogate primary key assigned during data generation.
sale_date
DATE
required
Calendar date of the transaction. Ranges from 2023-01-01 to 2025-12-31. Always ≥ the product’s launch_date.
customer_id
INTEGER
required
Foreign key → dim_customers.customer_id.
product_id
INTEGER
required
Foreign key → dim_products.product_id.
units
INTEGER
required
Number of units sold in this transaction. Base range is 1–3. Discounted transactions receive an additional 0–2 units to simulate promotional lift.
discount
NUMERIC(4,2)
required
Fractional discount applied to the unit price. Sampled from {0.00, 0.10, 0.20, 0.30} with weights 70% / 15% / 10% / 5% respectively. A value of 0.10 represents a 10% discount.
revenue
NUMERIC(10,2)
required
Net revenue after discount: units × unit_price × (1 − discount). Rounded to 2 decimal places.
cost
NUMERIC(10,2)
required
Total cost of goods: units × unit_cost. Rounded to 2 decimal places.
margin
NUMERIC(10,2)
required
Gross margin: revenue − cost. Rounded to 2 decimal places. Can be negative when deep discounts push revenue below cost for high-cost products.

Foreign Key Relationships

Both foreign keys in fact_sales are enforced at the database level.
dim_products (product_id)  ◄──┐
                               │  fact_sales.product_id

fact_sales ────────────────────┤

                               │  fact_sales.customer_id
dim_customers (customer_id) ◄──┘
The TRUNCATE … CASCADE pattern used by the ETL loader respects these constraints by truncating fact_sales before the dimension tables, preventing foreign-key violations during reloads.

Domain Value Reference

FieldTableValid Values
categorydim_productsGummies, Belts, Seasonal, Marshmallow, Licorice, Foam, Novelty
seasondim_productsAll Year, Halloween, Christmas, Summer, Valentine
regiondim_customersNorth, South, East, West, Center
preferred_channeldim_customersSupermarket, E-commerce, Convenience Store, Hypermarket
purchase_frequencydim_customersLow, Medium, High
age_groupdim_customers18-24, 25-34, 35-44, 45-54, 55+
discountfact_sales0.00 (70%), 0.10 (15%), 0.20 (10%), 0.30 (5%)
The ETL loader in etl/load_to_postgres.py uses TRUNCATE TABLE fact_sales, dim_customers, dim_products CASCADE before every load. The CASCADE keyword automatically propagates the truncation to any dependent objects, ensuring foreign-key constraints are never violated even when tables are cleared in a single statement. This makes it safe to re-run the full pipeline without manually ordering deletes.

Build docs developers (and LLMs) love