Overview
The sales normalization process reads raw sales transaction data from CSV files, validates and cleans the data, and generates normalized database tables with proper foreign key relationships. This module is implemented innormalizar_ventas.py.
Input File
File:ventas_prueba_mas_datos.csv
Expected Columns:
venta_id- Sale transaction ID (optional, auto-generated if missing)cliente_id- Customer ID (required, must be valid integer)fecha- Transaction datetotal- Transaction amountmoneda- Currency code (e.g., USD, EUR, COP)canal- Sales channel (e.g., Online, Tienda, Teléfono)
Output Files
The normalization process generates four CSV files:-
ventas_normalizadas.csv - Normalized sales table with foreign keys
- Fields:
venta_id,cliente_id,fecha,total,moneda_id,canal_id
- Fields:
-
canales.csv - Sales channel dimension table
- Fields:
canal_id,nombre
- Fields:
-
monedas.csv - Currency dimension table
- Fields:
moneda_id,codigo
- Fields:
-
ventas_normalizadas_completo.csv - Denormalized view with human-readable values
- Fields:
venta_id,cliente_id,fecha,total,moneda,canal
- Fields:
Data Cleaning Functions
The module uses the same core cleaning functions as customer normalization:Processing Logic
Main Processing Function
Theprocesar_ventas() function implements the sales normalization workflow:
Sale ID Generation
Sale IDs start at 1001 and auto-increment for missing or invalid IDs:Customer ID Validation
Unlike customer normalization, sales records require a valid customer ID:cliente_id are silently skipped.
Data Validation
All required fields must be present and valid:cliente_id- Required, must be valid integertotal- Required, must be valid numbermoneda- Required, must be non-empty textcanal- Required, must be non-empty textfecha- Optional, stored as empty string if invalid
Dimension Table Creation
The system extracts unique channels and currencies:Mapping Logic
Lookup dictionaries map dimension values to IDs:canal_map Example
moneda_map Example
File Output
All files use UTF-8 BOM encoding for Excel compatibility:Complete View Generation
The denormalized view joins dimension tables back for readability:Data Transformation Examples
Example 1: Valid Sale Record
Input:Example 2: Missing Sale ID
Input:Example 3: Skipped Records
Input:- Record 1002: Missing
cliente_id - Record 1003: Missing
total - Record 1004: Missing
moneda
Example 4: Number Formatting
Input:Usage
Run the normalization script from the command line:Error Handling
The script handles several error conditions:- File Not Found: Displays error message and exits
- Encoding Errors: Automatically tries multiple encodings
- Invalid Customer ID: Record is skipped silently
- Missing Required Fields: Record is skipped silently
- Invalid Numbers: Record is skipped if total cannot be parsed
Key Differences from Customer Normalization
| Aspect | Customer Normalization | Sales Normalization |
|---|---|---|
| ID Counter Start | 1 | 1001 |
| Required Fields | segmento only | cliente_id, total, moneda, canal |
| Missing Values | Uses defaults | Skips record |
| Dimension Tables | ciudades, segmentos | canales, monedas |
| Date Handling | Optional | Optional |