ProfeLedesma is an end-to-end educational workshop that walks you through the critical, often underestimated task of preparing time-series data for Machine Learning models. Working with real GHI (Global Horizontal Irradiance) solar measurements from meteorological stations across South America, you will learn how raw sensor data is transformed into a clean, model-ready dataset — step by step, in Python.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/maxiricalde/ProfeLedesma/llms.txt
Use this file to discover all available pages before exploring further.
Introduction
Understand why preprocessing is essential and what techniques this workshop covers.
Core Concepts
Learn about GHI, solar geometry, and what makes time-series preprocessing unique.
The Dataset
Explore the measurement stations, raw CSV format, and modeled reference data.
Preprocessing Steps
Walk through duplicate removal, quality control filters, outlier detection, and resampling.
Modeling & Evaluation
Split data correctly for time-series ML, evaluate with statistical metrics, and correct bias.
API Reference
Full documentation for the Geo, QualityControl, Metrics, and Sites helper modules.
What You Will Learn
Understand the data
Load multi-year GHI CSV files from real pyranometer stations, inspect their structure, and handle the first round of obvious issues like duplicates and irregular timestamps.
Apply quality control
Use physically-motivated filters — derived from solar geometry models — to flag and remove measurements that are physically impossible or statistically improbable.
Resample and aggregate
Collapse 1-minute raw data into 15-minute and 60-minute averages, applying minimum-count thresholds to avoid spurious aggregates from incomplete windows.
All code in this workshop is pure Python and runs inside a Jupyter Notebook (
01_merge.ipynb). Dependencies are pandas, numpy, and matplotlib.