Get Started with ETL Dinámico in 5 Steps

This guide walks you through everything required to go from a fresh machine to a running ETL pipeline in under ten minutes. You will clone the repository, set up a Python virtual environment, install dependencies, create a .env configuration file for your SQL Server connections, and launch the Streamlit interface — or run a pipeline directly from Python code.

A .env file must exist at the project root (the same directory as app.py) before you start the application or run any pipeline. The Settings class loads it at import time; if the file is missing or the variables are undefined, all database connections will fail silently with None values.

Clone the Repository

Download the project source code from GitHub using git clone, then navigate into the project directory.

git clone https://github.com/obedc295/proyecto_dw.git
cd proyecto_dw

After cloning you will see the following top-level structure:

proyecto_dw/
├── app.py              # Streamlit entry point
├── requirements.txt    # Python dependencies
├── src/
│   ├── config/
│   │   └── settings.py          # Environment variable loader & URL builder
│   ├── infrastructure/
│   │   └── db_client.py         # SQLAlchemy engine wrapper
│   └── services/
│       ├── extractor.py         # DataExtractor
│       ├── transformer.py       # DataTransformer
│       ├── loader.py            # DataLoader
│       └── pipeline.py          # ETLPipeline orchestrator
└── tests/              # Pytest test suite

Create and Activate a Python Virtual Environment

Creating an isolated virtual environment prevents ETL Dinámico’s dependencies from conflicting with other Python projects on your machine. Python 3.10 or higher is required.

Windows (PowerShell / CMD)
Linux / macOS / WSL

python -m venv .venv
.venv\Scripts\activate

After activation your terminal prompt will be prefixed with (.venv). On PowerShell, if you receive an execution policy error, run Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser first.

python3 -m venv .venv
source .venv/bin/activate

After activation your terminal prompt will be prefixed with (.venv). Use deactivate at any time to exit the virtual environment.

Install Dependencies

With the virtual environment activated, upgrade pip and install all required packages from requirements.txt.

pip install --upgrade pip && pip install -r requirements.txt

This installs the following packages:

Package	Version Constraint	Purpose
`pyodbc`	`>=5.0.0`	Low-level ODBC bridge between Python and SQL Server via ODBC Driver 17
`SQLAlchemy`	`>=2.0.0`	Database engine abstraction; provides connection pooling and the `create_engine` API
`pandas`	`>=2.0.0`	In-memory DataFrame engine used by the Extractor and Transformer
`streamlit`	`>=1.30.0`	Browser-based UI framework for the interactive pipeline control panel
`python-dotenv`	`>=1.0.0`	Reads the `.env` file and injects variables into `os.environ` at startup
`pytest`	`>=9.1.1`	Test runner for the automated unit test suite in the `tests/` directory

If pyodbc installation fails on Linux, you may need to install the unixODBC development headers first: sudo apt-get install unixodbc-dev (Debian/Ubuntu) or sudo yum install unixODBC-devel (RHEL/CentOS).

Create the .env Configuration File

Create a file named .env in the project root. This file holds the six connection parameters that Settings uses to build SQLAlchemy URLs for both the OLTP source and the OLAP destination.The example below uses the classic Northwind sample database as the OLTP source and a Data Warehouse database called DW_NORTHWIND as the OLAP destination, both hosted on a local SQLEXPRESS instance:

# OLTP Source — transactional / operational database
OLTP_SERVER=ELIEL\SQLEXPRESS
OLTP_DATABASE=NORTHWND
OLTP_DRIVER=ODBC Driver 17 for SQL Server

# OLAP Destination — Data Warehouse
OLAP_SERVER=ELIEL\SQLEXPRESS
OLAP_DATABASE=DW_NORTHWIND
OLAP_DRIVER=ODBC Driver 17 for SQL Server

Replace ELIEL\SQLEXPRESS with your actual server name or hostname. If your SQL Server instance is the default (unnamed) instance, use just the hostname or IP address without the backslash and instance name.

Do not wrap values in quotes inside .env files when using python-dotenv. Write OLTP_SERVER=ELIEL\SQLEXPRESS, not OLTP_SERVER="ELIEL\SQLEXPRESS". The Settings class calls urllib.parse.quote_plus() to handle the backslash encoding automatically.

Launch the Streamlit UI

With your .env file in place, start the interactive pipeline interface by running:

streamlit run app.py

Streamlit will compile the application and automatically open a browser tab at:

http://localhost:8501

From the UI you can:

Select a source table from the OLTP database (or write a custom SQL query).
Choose columns to extract and preview up to 10 rows before committing.
Configure column mappings — assign each source column a transformation type (Ninguna, Upper, Lower, Año, Mes, Día) and map it to a column in the destination table.
Add concatenation mappings to combine two source columns (e.g., FirstName + LastName → FullName).
Set the business key used for incremental loading.
Click “Ejecutar ETL” to run the pipeline and view row counts, a JSON result summary, and a data preview.

Programmatic Usage

You can also drive ETL Dinámico entirely from Python code, bypassing the Streamlit UI. This is useful for scheduled jobs, integration tests, or embedding the pipeline inside a larger orchestration workflow. The example below extracts customer data from the Northwind dbo.Customers table, applies upper case to the name fields, and loads new records into DimCustomers using CustomerID as the business key.

from src.infrastructure.db_client import DatabaseClient
from src.services.pipeline import ETLPipeline

# Initialise the database client (reads OLTP_URL and OLAP_URL from Settings)
db_client = DatabaseClient()
pipeline = ETLPipeline(db_client)

result = pipeline.run_dynamic_etl(
    source_table="dbo.Customers",
    target_table="DimCustomers",
    business_key="CustomerID",
    column_mappings=[
        {"source_column": "CustomerID",   "transform_type": "none",  "target_column": "CustomerID"},
        {"source_column": "CompanyName",  "transform_type": "upper", "target_column": "CompanyName"},
        {"source_column": "ContactName",  "transform_type": "upper", "target_column": "ContactName"},
    ]
)

print(f"Extracted: {result['rows_extracted']}, Loaded: {result['rows_loaded']}")
# Example output: Extracted: 91, Loaded: 14

The run_dynamic_etl() method returns a dictionary with the following keys:

Key	Type	Description
`status`	`str`	`"success"` if the pipeline completed without raising an exception
`rows_extracted`	`int`	Total number of rows pulled from the source
`rows_loaded`	`int`	Rows actually written to the destination (new records only)
`table_destination`	`str`	The `target_table` argument passed to the call
`sample_data`	`DataFrame`	First 5 rows of the final transformed and loaded DataFrame

Running the Test Suite

After installation you can verify that the infrastructure, transformers, loader, and pipeline orchestrator are all working correctly by running:

pytest tests/ -v

A passing test run confirms that your Python environment and project structure are correctly configured before you connect to a live database.

Get Started

Architecture

ETL Pipeline

UI Guide

Testing & Development

Programmatic Usage

Running the Test Suite

Build docs developers (and LLMs) love

Get Started

Architecture

ETL Pipeline

UI Guide

Testing & Development

Documentation Index

​Programmatic Usage

​Running the Test Suite

Build docs developers (and LLMs) love

Programmatic Usage

Running the Test Suite