Documentation Index
Fetch the complete documentation index at: https://mintlify.com/obedc295/proyect_dw/llms.txt
Use this file to discover all available pages before exploring further.
DataLoader is the final stage of the ETL pipeline. Its single public method, load_incremental(), writes transformed data into the OLAP Data Warehouse while ensuring that records already present in the target table are never duplicated. It achieves this by reading the existing set of business key values from the DW table before writing, then filtering the incoming DataFrame to keep only rows whose key has not been seen before. Only the net-new rows are appended.
What is a business key?
A business key is a column that uniquely identifies each record in the target Data Warehouse table — for exampleCustomerID in a DimCustomers dimension, or TerritoryID in a DimSalesTerritory table. load_incremental() uses this key as the deduplication criterion: if a row arriving from the OLTP source already has a matching key in the DW table, it is silently skipped. If the key is new, the row is inserted.
The business key column must be present in the transformed
DataFrame and must be mapped in column_mappings so that ETLPipeline includes it in the final column selection before handing the data to DataLoader.load_incremental
The transformed DataFrame produced by
DataTransformer. Must contain the column named by business_key.Name of the target table in the OLAP Data Warehouse (e.g.,
"DimCustomers"). load_incremental uses if_exists='append', which appends rows when the table already exists and creates the table automatically on the first run when it does not yet exist.The column name to use for deduplication. Must exist in both
df_transformer and the target DW table.int — the number of rows actually inserted into the DW table.
Step-by-step loading logic
Connect to the OLAP database
Opens a connection to the Data Warehouse via
db_client.get_olap_connection(), which returns a SQLAlchemy connection context manager.Read existing business keys
Fetches all current values of the business key column from the target table:The result is loaded into a Python list (
valid_keys).Filter for new rows only
Removes any rows from the incoming DataFrame whose business key already appears in the DW:Rows with a key that matches an existing DW record are dropped. Only genuinely new rows remain in
df_load.Append new rows
Writes
df_load to the target DW table using pandas.DataFrame.to_sql with if_exists='append' and index=False:Code example
First run vs. subsequent runs
Return value
The number of rows appended to the target DW table during this call. Will be
0 if all incoming records already exist in the table (no-op run). This value is surfaced in the ETLPipeline result dict as "rows_loaded".