Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/obedc295/proyect_dw/llms.txt

Use this file to discover all available pages before exploring further.

ETL Dinámico is a layered, modular Extract-Transform-Load system that connects a transactional SQL Server database (OLTP) to a Data Warehouse (OLAP). It lets you configure column mappings, apply transformations, and run incremental loads entirely through an interactive Streamlit dashboard — no hardcoded schemas required.

Quickstart

Install dependencies, configure your .env, and run your first ETL pipeline in minutes.

Configuration

Set up OLTP and OLAP connection strings using environment variables.

Architecture Overview

Understand the four-layer design: Settings, DatabaseClient, Services, and UI.

ETL Pipeline

Learn how extraction, transformation, and incremental loading work together.

UI Dashboard Guide

Use the Streamlit interface to map columns, preview data, and execute ETL runs.

API Reference

Full reference for all public classes: ETLPipeline, DataExtractor, DataTransformer, and more.

How It Works

ETL Dinámico follows a clean three-phase pipeline orchestrated by ETLPipeline.run_dynamic_etl():
1

Extract

DataExtractor reads from your SQL Server OLTP source — either a full table or a custom SQL query — and returns a Pandas DataFrame.
2

Transform

DataTransformer applies per-column operations: uppercase/lowercase text, date component extraction (year, month, day), or concatenating two columns into one.
3

Load

DataLoader.load_incremental() compares records against the existing Data Warehouse table using a business key, inserting only new rows to prevent duplicates.

Key Features

Dynamic Column Mapping

Configure source→target column mappings and transformations at runtime through the UI — no code changes needed.

Incremental Loading

Business-key deduplication ensures only new records enter the Data Warehouse on every run.

Multiple Transform Types

Upper/lower case conversion, year/month/day extraction from dates, and multi-column concatenation.

Custom SQL Support

Write a custom SELECT query as the extraction source instead of selecting a full table.

Streamlit Dashboard

An interactive web UI lets you configure and run ETL pipelines without writing Python.

Automated Tests

Pytest suite covers connection health, transformer logic, incremental loader filtering, and full pipeline orchestration.

Build docs developers (and LLMs) love