Once the raw CSV files have been generated and validated, the loading stage transfers them into a PostgreSQL data warehouse structured around a simple star schema: two dimension tables (Documentation Index
Fetch the complete documentation index at: https://mintlify.com/andresshm/fini-marketing-intelligence/llms.txt
Use this file to discover all available pages before exploring further.
dim_products, dim_customers) and one fact table (fact_sales). The loader reads connection credentials from environment variables, truncates all tables before every run to avoid duplicates, and inserts rows in an order that satisfies foreign-key constraints — dimensions first, facts last.
Configuration — etl/config.py
Database credentials are read from environment variables at import time using python-dotenv. Only POSTGRES_HOST has a default value (localhost); all other variables must be set explicitly.
Environment variables
Hostname or IP address of the PostgreSQL server. Defaults to
localhost when the variable is unset.Port on which PostgreSQL is listening. Typically
5432.Name of the target database.
PostgreSQL role / username used to authenticate.
Password for the PostgreSQL role.
Example .env file
Create a .env file in the project root (never commit it to version control):
Connection string
etl/load_to_postgres.py assembles a SQLAlchemy connection URL from DB_CONFIG and creates an engine:
load_table() — etl/load_utils.py
load_table is the single utility responsible for reading a CSV and appending its rows to a PostgreSQL table. It uses pandas to read the file and DataFrame.to_sql with if_exists="append" so that it never recreates the table schema — it only inserts rows.
| Parameter | Type | Description |
|---|---|---|
engine | sqlalchemy.Engine | Active SQLAlchemy engine connected to the target database. |
csv_path | pathlib.Path | Path to the source CSV file (relative to the project root). |
table_name | str | Name of the destination PostgreSQL table. |
clear_tables() — etl/load_to_postgres.py
Before loading, all three tables are truncated in a single atomic transaction. The CASCADE clause ensures that any dependent foreign-key rows are removed in the correct order without needing to manually disable constraints:
Loading order
After truncation, the three tables are loaded in dependency order — dimension tables before the fact table — so that foreign-key constraints are satisfied at the time each row is inserted:dim_products
data/raw/products.csv → dim_products (20 rows). Must be loaded first because fact_sales references product_id.dim_customers
data/raw/customers.csv → dim_customers (5,000 rows). Must be loaded before fact_sales because of the customer_id foreign key.