ProfeLedesma: ML Data Preprocessing Workshop - ProfeLedesma

What You Will Learn

ProfeLedesma is an end-to-end educational workshop that walks you through the critical, often underestimated task of preparing time-series data for Machine Learning models. Working with real GHI (Global Horizontal Irradiance) solar measurements from meteorological stations across South America, you will learn how raw sensor data is transformed into a clean, model-ready dataset — step by step, in Python.

Introduction

Understand why preprocessing is essential and what techniques this workshop covers.

Core Concepts

Learn about GHI, solar geometry, and what makes time-series preprocessing unique.

The Dataset

Explore the measurement stations, raw CSV format, and modeled reference data.

Preprocessing Steps

Walk through duplicate removal, quality control filters, outlier detection, and resampling.

Modeling & Evaluation

Split data correctly for time-series ML, evaluate with statistical metrics, and correct bias.

API Reference

Full documentation for the Geo, QualityControl, Metrics, and Sites helper modules.

What You Will Learn

Understand the data

Load multi-year GHI CSV files from real pyranometer stations, inspect their structure, and handle the first round of obvious issues like duplicates and irregular timestamps.

Apply quality control

Use physically-motivated filters — derived from solar geometry models — to flag and remove measurements that are physically impossible or statistically improbable.

Resample and aggregate

Collapse 1-minute raw data into 15-minute and 60-minute averages, applying minimum-count thresholds to avoid spurious aggregates from incomplete windows.

Train, evaluate, and correct

Split the clean time-series into training and test sets, measure model performance with MBE, RMSD, KSI, and SS4, and apply a simple linear bias correction.

All code in this workshop is pure Python and runs inside a Jupyter Notebook (01_merge.ipynb). Dependencies are pandas, numpy, and matplotlib.

Build docs developers (and LLMs) love

Get started for free Talk to us

Introduction

The Dataset

Preprocessing Steps

Modeling & Evaluation

Documentation Index

Introduction

Core Concepts

The Dataset

Preprocessing Steps

Modeling & Evaluation

API Reference

​What You Will Learn

Build docs developers (and LLMs) love

What You Will Learn