ETL Dinámico uses an incremental loading strategy, which means you can run the same pipeline multiple times without worrying about inserting duplicate records into your Data Warehouse. Each run automatically compares the incoming data against what already exists in the destination table and inserts only the rows that are genuinely new. Understanding this behavior helps you interpret the result metrics correctly and avoid common pitfalls when loading dimensions and facts.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/obedc295/proyect_dw/llms.txt
Use this file to discover all available pages before exploring further.
What Is Incremental Loading?
Traditional bulk loads truncate the destination table and reload everything from scratch on every run. Incremental loading is different: it keeps existing records intact and appends only the delta — the rows whose identifier has not been seen before. In ETL Dinámico, the incremental strategy is implemented byDataLoader.load_incremental():
~isin() mask, and appends only the surviving rows with to_sql(..., if_exists='append').
The Business Key
The business key is the single column that uniquely identifies a record in both the source data and the Data Warehouse table. It is the bridge the incremental loader uses to decide whether a row is new or already present.Good business key choices
- Primary key of the OLTP table (e.g.,
CustomerID,SalesOrderID) - Natural key of the dimension (e.g.,
TerritoryID,ProductCode) - Surrogate key already generated in the OLAP schema
Poor business key choices
- Columns with nullable values (NULLs are always treated as new)
- Non-unique columns (e.g., names, descriptions)
- Composite keys — only a single column is supported per run
target_column in at least one entry in column_mappings (regular or concat) — the validator enforces this before the pipeline runs.
Reading the Result Metrics
After a successful run, the dashboard displays two metric cards:| Metric | Meaning |
|---|---|
Filas extraídas (rows_extracted) | Total rows read from the OLTP source — the full table or full SQL query result — before any filtering |
Filas cargadas (rows_loaded) | Rows actually inserted into the DW — the count returned by load_incremental() after the duplicate filter |
rows_extracted and rows_loaded is the number of records that already existed in the DW and were skipped.
A
rows_loaded value of 0 while rows_extracted is greater than 0 is completely normal on repeat runs when no new data has been added to the source. It does not indicate an error — it means every incoming record was already present in the Data Warehouse.Common Run Scenarios
First run — all records are new
The DW table is empty (or the business key column has no matching values). All 847 rows from the OLTP source are inserted into the DW.
load_incremental() finds no existing keys, so the filter passes every incoming row.Repeat run — no new data in OLTP
You run the same pipeline again without any changes to the source table. Every business key from the OLTP now exists in the DW, so Zero rows are inserted. The DW table is unchanged.
~isin() eliminates all rows.Constraints and Edge Cases
The target table must already exist in the Data Warehouse before you run the ETL.
DataLoader calls to_sql(..., if_exists='append'), which appends to an existing table but does not create one from scratch. If the table does not exist, SQLAlchemy will raise an error. Create the destination table (with the correct schema) in your DW database before the first run.How It Fits Into the Full Pipeline
The incremental load is the final step of the four-stage pipeline executed byETLPipeline.run_dynamic_etl():
rows_loaded integer returned by step 4 is what appears in the Filas cargadas metric on the dashboard.