Once GHI data has been cleaned and quality-controlled, the 1-minute resolution is often too fine-grained for most ML applications. This step aggregates data to coarser temporal resolutions while ensuring that aggregated windows with too few valid readings are discarded.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/maxiricalde/ProfeLedesma/llms.txt
Use this file to discover all available pages before exploring further.
Target resolutions
The workshop produces two output resolutions:| Resolution | Window | Minimum valid readings required |
|---|---|---|
| 15-minute | 15 possible readings | > 10 |
| 60-minute | 60 possible readings | > 40 |
Resampling code
Saving the outputs
Why the minimum-count threshold matters
If only 2 out of 60 minutes in a window had valid readings — for example, because a large gap exists in that hour — the hourly mean would be computed from just those 2 points. That result would not fairly represent the hour and should instead be treated as missing data. The threshold ensures that only windows with sufficient coverage are retained:- A 15-minute window needs more than 10 valid 1-minute readings (i.e. at least 11 out of 15).
- A 60-minute window needs more than 40 valid 1-minute readings (i.e. at least 41 out of 60).
NaN.
resample().count() counts non-NaN values. After the QC step — which replaces rejected readings with NaN — the count automatically reflects only physically valid measurements, with no additional filtering required.