The validation layer runs at the end of the full pipeline, after data generation and database loading have completed. It provides a lightweight sanity check that catches truncated writes, partial runs, or accidental overwrites. The single functionDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/andresshm/fini-marketing-intelligence/llms.txt
Use this file to discover all available pages before exploring further.
validate_csv is called by run_pipeline.py after all pipeline steps — including the PostgreSQL load — have finished, raising an immediate error with a clear message if anything is wrong so the execution report is never written against bad data.
validate_csv() — etl/validation.py
Full source
Parameters
Path to the CSV file to validate (relative to the project root).
pd.read_csv will raise a FileNotFoundError automatically if the file does not exist.Optional exact row count the file must contain. When provided and the actual count differs, a
ValueError is raised. Pass None (the default) to skip the row-count check and only verify the file is non-empty.Return value
Returns the integer row count of the validated file.run_pipeline.py captures this value and writes it into the execution report.
Checks performed
The function performs two sequential checks:Non-empty check
After loading the file with
pd.read_csv, the function verifies that len(df) > 0. An empty file raises:Usage in run_pipeline.py
The pipeline runner calls validate_csv after all pipeline steps have completed — including the PostgreSQL load. Each call targets one of the three raw CSV files with the exact row count that the corresponding generator is designed to produce:
run_pipeline.py logs "Validaciones completadas." and writes the row counts into the timestamped Markdown report in reports/.
Calling it programmatically
You can import and usevalidate_csv in any script or notebook to check a file independently of the full pipeline: