This guide walks you through everything required to go from a fresh machine to a running ETL pipeline in under ten minutes. You will clone the repository, set up a Python virtual environment, install dependencies, create aDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/obedc295/proyect_dw/llms.txt
Use this file to discover all available pages before exploring further.
.env configuration file for your SQL Server connections, and launch the Streamlit interface — or run a pipeline directly from Python code.
A
.env file must exist at the project root (the same directory as app.py) before you start the application or run any pipeline. The Settings class loads it at import time; if the file is missing or the variables are undefined, all database connections will fail silently with None values.Clone the Repository
Download the project source code from GitHub using After cloning you will see the following top-level structure:
git clone, then navigate into the project directory.Create and Activate a Python Virtual Environment
Creating an isolated virtual environment prevents ETL Dinámico’s dependencies from conflicting with other Python projects on your machine. Python 3.10 or higher is required.After activation your terminal prompt will be prefixed with
- Windows (PowerShell / CMD)
- Linux / macOS / WSL
(.venv). On PowerShell, if you receive an execution policy error, run Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser first.Install Dependencies
With the virtual environment activated, upgrade This installs the following packages:
pip and install all required packages from requirements.txt.| Package | Version Constraint | Purpose |
|---|---|---|
pyodbc | >=5.0.0 | Low-level ODBC bridge between Python and SQL Server via ODBC Driver 17 |
SQLAlchemy | >=2.0.0 | Database engine abstraction; provides connection pooling and the create_engine API |
pandas | >=2.0.0 | In-memory DataFrame engine used by the Extractor and Transformer |
streamlit | >=1.30.0 | Browser-based UI framework for the interactive pipeline control panel |
python-dotenv | >=1.0.0 | Reads the .env file and injects variables into os.environ at startup |
pytest | >=9.1.1 | Test runner for the automated unit test suite in the tests/ directory |
Create the .env Configuration File
Create a file named Replace
.env in the project root. This file holds the six connection parameters that Settings uses to build SQLAlchemy URLs for both the OLTP source and the OLAP destination.The example below uses the classic Northwind sample database as the OLTP source and a Data Warehouse database called DW_NORTHWIND as the OLAP destination, both hosted on a local SQLEXPRESS instance:ELIEL\SQLEXPRESS with your actual server name or hostname. If your SQL Server instance is the default (unnamed) instance, use just the hostname or IP address without the backslash and instance name.Launch the Streamlit UI
With your Streamlit will compile the application and automatically open a browser tab at:From the UI you can:
.env file in place, start the interactive pipeline interface by running:- Select a source table from the OLTP database (or write a custom SQL query).
- Choose columns to extract and preview up to 10 rows before committing.
- Configure column mappings — assign each source column a transformation type (Ninguna, Upper, Lower, Año, Mes, Día) and map it to a column in the destination table.
- Add concatenation mappings to combine two source columns (e.g.,
FirstName+LastName→FullName). - Set the business key used for incremental loading.
- Click “Ejecutar ETL” to run the pipeline and view row counts, a JSON result summary, and a data preview.
Programmatic Usage
You can also drive ETL Dinámico entirely from Python code, bypassing the Streamlit UI. This is useful for scheduled jobs, integration tests, or embedding the pipeline inside a larger orchestration workflow. The example below extracts customer data from the Northwinddbo.Customers table, applies upper case to the name fields, and loads new records into DimCustomers using CustomerID as the business key.
run_dynamic_etl() method returns a dictionary with the following keys:
| Key | Type | Description |
|---|---|---|
status | str | "success" if the pipeline completed without raising an exception |
rows_extracted | int | Total number of rows pulled from the source |
rows_loaded | int | Rows actually written to the destination (new records only) |
table_destination | str | The target_table argument passed to the call |
sample_data | DataFrame | First 5 rows of the final transformed and loaded DataFrame |