Documentation Index
Fetch the complete documentation index at: https://mintlify.com/obedc295/proyect_dw/llms.txt
Use this file to discover all available pages before exploring further.
DataTransformer applies column-level transformations to a Pandas DataFrame extracted from the OLTP source. It makes no database calls — every operation is pure Pandas logic that runs entirely in memory. All three methods accept a DataFrame as their first argument and write their result into a new or existing column, then return the modified DataFrame. This design makes transformations chainable and keeps the original source columns intact (where applicable).
Transform types at a glance
| Transform | Method | operation values | Input type | Output type |
|---|---|---|---|---|
| Text case | capitalize_transform | upper, lower | string | string |
| Concatenation | concat_transform | N/A | string, string | string |
| Date extraction | date_transform | year, month, day | date / string | integer |
Class overview
DataTransformer takes no constructor arguments. Instantiate it once and reuse it across multiple DataFrame operations.
Public methods
capitalize_transform
column to either uppercase or lowercase and writes the result into new_column. The original column is left unchanged.
operation='upper'applies.str.upper()— all characters become uppercase.operation='lower'applies.str.lower()— all characters become lowercase.
capitalize_transform writes its output to new_column — the source column is preserved unchanged in the DataFrame. If new_column is the same string as column, the original values will be overwritten.concat_transform
new_column. The concatenation is performed as:
DataFrame.
Like
capitalize_transform, concat_transform writes its result to new_column without modifying column1 or column2. Both source columns remain available for use in subsequent transformations or as pass-through columns.date_transform
column and stores the integer result in new_column. The source column is first coerced to datetime64 via pd.to_datetime() in-place, then the appropriate .dt accessor is used:
operation='year'→.dt.year— returns the four-digit year as an integer.operation='month'→.dt.month— returns the month number (1–12) as an integer.operation='day'→.dt.day— returns the day of the month (1–31) as an integer.
date_transform converts the source column in-place to datetime64 as part of processing — df[column] = pd.to_datetime(df[column]) is called before extracting the component. The source column type will change from object (string) or another date-like type to datetime64[ns] after this method runs.Using transformations inside ETLPipeline
When ETLPipeline.run_dynamic_etl() processes the column_mappings list, it reads the transform_type field of each mapping dict to decide which DataTransformer method to invoke:
transform_type values and DataTransformer methods is:
transform_type | Method called | Notes |
|---|---|---|
none | Direct assignment: df[target] = df[source] | Column is copied as-is |
upper | capitalize_transform(..., operation='upper') | |
lower | capitalize_transform(..., operation='lower') | |
year | date_transform(..., operation='year') | |
month | date_transform(..., operation='month') | |
day | date_transform(..., operation='day') | |
N/A (type: concat) | concat_transform(...) | Dict uses type, not transform_type |