Loading Synthetic CSV Data into PostgreSQL with SQLAlchemy

Once the raw CSV files have been generated and validated, the loading stage transfers them into a PostgreSQL data warehouse structured around a simple star schema: two dimension tables (dim_products, dim_customers) and one fact table (fact_sales). The loader reads connection credentials from environment variables, truncates all tables before every run to avoid duplicates, and inserts rows in an order that satisfies foreign-key constraints — dimensions first, facts last.

clear_tables() is called automatically at the start of every load run. It issues a TRUNCATE TABLE ... CASCADE that permanently removes all existing rows from fact_sales, dim_customers, and dim_products. Never run the loader against a production database unless you intend to replace all data.

Configuration — `etl/config.py`

Database credentials are read from environment variables at import time using python-dotenv. Only POSTGRES_HOST has a default value (localhost); all other variables must be set explicitly.

import os
from dotenv import load_dotenv

load_dotenv()

DB_CONFIG = {
    "host":     os.getenv("POSTGRES_HOST", "localhost"),
    "port":     os.getenv("POSTGRES_PORT"),
    "database": os.getenv("POSTGRES_DB"),
    "user":     os.getenv("POSTGRES_USER"),
    "password": os.getenv("POSTGRES_PASSWORD"),
}

Environment variables

POSTGRES_HOST

string

default:"localhost"

Hostname or IP address of the PostgreSQL server. Defaults to localhost when the variable is unset.

POSTGRES_PORT

string

required

Port on which PostgreSQL is listening. Typically 5432.

POSTGRES_DB

string

required

Name of the target database.

POSTGRES_USER

string

required

PostgreSQL role / username used to authenticate.

POSTGRES_PASSWORD

string

required

Password for the PostgreSQL role.

Example `.env` file

Create a .env file in the project root (never commit it to version control):

POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=fini_dw
POSTGRES_USER=fini_user
POSTGRES_PASSWORD=supersecret

Connection string

etl/load_to_postgres.py assembles a SQLAlchemy connection URL from DB_CONFIG and creates an engine:

from sqlalchemy import create_engine
from etl.config import DB_CONFIG

DATABASE_URL = (
    f"postgresql+psycopg2://"
    f"{DB_CONFIG['user']}:{DB_CONFIG['password']}"
    f"@{DB_CONFIG['host']}:{DB_CONFIG['port']}"
    f"/{DB_CONFIG['database']}"
)

engine = create_engine(DATABASE_URL)

The format follows the standard SQLAlchemy pattern:

postgresql+psycopg2://user:password@host:port/database

`load_table()` — `etl/load_utils.py`

load_table is the single utility responsible for reading a CSV and appending its rows to a PostgreSQL table. It uses pandas to read the file and DataFrame.to_sql with if_exists="append" so that it never recreates the table schema — it only inserts rows.

import pandas as pd
from sqlalchemy import text

def load_table(engine, csv_path, table_name):

    df = pd.read_csv(csv_path)

    df.to_sql(
        table_name,
        engine,
        if_exists="append",
        index=False
    )

    print(f"{len(df)} registros cargados en {table_name}.")

Parameter	Type	Description
`engine`	`sqlalchemy.Engine`	Active SQLAlchemy engine connected to the target database.
`csv_path`	`pathlib.Path`	Path to the source CSV file (relative to the project root).
`table_name`	`str`	Name of the destination PostgreSQL table.

`clear_tables()` — `etl/load_to_postgres.py`

Before loading, all three tables are truncated in a single atomic transaction. The CASCADE clause ensures that any dependent foreign-key rows are removed in the correct order without needing to manually disable constraints:

from sqlalchemy import text

def clear_tables(engine):
    """
    Vacía el Data Warehouse respetando
    las relaciones entre tablas.
    """

    with engine.begin() as conn:

        conn.execute(
            text("""
                TRUNCATE TABLE
                    fact_sales,
                    dim_customers,
                    dim_products
                CASCADE;
            """)
        )

    print("Tablas limpiadas correctamente.")

Loading order

After truncation, the three tables are loaded in dependency order — dimension tables before the fact table — so that foreign-key constraints are satisfied at the time each row is inserted:

tables = [
    ("data/raw/products.csv", "dim_products"),
    ("data/raw/customers.csv", "dim_customers"),
    ("data/raw/sales.csv",    "fact_sales"),
]

for csv_path, table_name in tables:
    print(f"\nCargando {table_name}...")
    load_table(engine, Path(csv_path), table_name)

dim_products

data/raw/products.csv → dim_products (20 rows). Must be loaded first because fact_sales references product_id.

dim_customers

data/raw/customers.csv → dim_customers (5,000 rows). Must be loaded before fact_sales because of the customer_id foreign key.

fact_sales

data/raw/sales.csv → fact_sales (100,000 rows). Loaded last after both dimension tables are populated.

Running the loader

python -m etl.load_to_postgres

Or as part of the full pipeline:

python run_pipeline.py

The loader prints progress to stdout for each table and confirms when all loads are complete.

Get Started

Architecture

ETL Pipeline

Analytics & Insights

Forecasting Models

Loading Synthetic CSV Data into PostgreSQL with SQLAlchemy

Configuration — `etl/config.py`

Environment variables

Example `.env` file

Connection string

`load_table()` — `etl/load_utils.py`

`clear_tables()` — `etl/load_to_postgres.py`

Loading order

Running the loader

Build docs developers (and LLMs) love

Get Started

Architecture

ETL Pipeline

Analytics & Insights

Forecasting Models

Documentation Index

​Configuration — etl/config.py

​Environment variables

​Example .env file

​Connection string

​load_table() — etl/load_utils.py

​clear_tables() — etl/load_to_postgres.py

​Loading order

​Running the loader

Build docs developers (and LLMs) love

Configuration — `etl/config.py`

Environment variables

Example `.env` file

Connection string

`load_table()` — `etl/load_utils.py`

`clear_tables()` — `etl/load_to_postgres.py`

Loading order

Running the loader