Resampling GHI Time Series to 15-Minute and Hourly

Target resolutions
Resampling code
Saving the outputs
Why the minimum-count threshold matters

Once GHI data has been cleaned and quality-controlled, the 1-minute resolution is often too fine-grained for most ML applications. This step aggregates data to coarser temporal resolutions while ensuring that aggregated windows with too few valid readings are discarded.

Target resolutions

The workshop produces two output resolutions:

Resolution	Window	Minimum valid readings required
15-minute	15 possible readings	> 10
60-minute	60 possible readings	> 40

Resampling code

import numpy as np

# 15-minute resampling
counts15 = df.resample(on='datetime', rule="15 min").ghi.count().values
df15 = df.resample(on='datetime', rule="15 min")[['ghi']].mean().reset_index()
df15['tot'] = counts15
df15['ghi'] = np.where(df15.tot > 10, df15.ghi, np.nan)

# 60-minute resampling
counts60 = df.resample(on='datetime', rule="60 min").ghi.count().values
df60 = df.resample(on='datetime', rule="60 min")[['ghi']].mean().reset_index()
df60['tot'] = counts60
df60['ghi'] = np.where(df60.tot > 40, df60.ghi, np.nan)

Saving the outputs

df15 = df15[['datetime', 'ghi']]
df60 = df60[['datetime', 'ghi']]

df15.to_csv('lq_15.csv', index=False)
df60.to_csv('lq_60.csv', index=False)

Why the minimum-count threshold matters

If only 2 out of 60 minutes in a window had valid readings — for example, because a large gap exists in that hour — the hourly mean would be computed from just those 2 points. That result would not fairly represent the hour and should instead be treated as missing data. The threshold ensures that only windows with sufficient coverage are retained:

A 15-minute window needs more than 10 valid 1-minute readings (i.e. at least 11 out of 15).
A 60-minute window needs more than 40 valid 1-minute readings (i.e. at least 41 out of 60).

Windows that fall below the threshold are replaced with NaN.

resample().count() counts non-NaN values. After the QC step — which replaces rejected readings with NaN — the count automatically reflects only physically valid measurements, with no additional filtering required.

Visualize the resampled series to confirm it looks reasonable before using it in an ML pipeline. A quick plt.plot(df60.datetime, df60.ghi) will expose any remaining anomalies or unexpected gaps.

Outlier Detection and Exploratory Data Analysis for GHI

Splitting Time-Series Data into Train and Test Sets

Build docs developers (and LLMs) love

Get started for free Talk to us

Introduction

The Dataset

Preprocessing Steps

Modeling & Evaluation

Target resolutions

Resampling code

Saving the outputs

Why the minimum-count threshold matters

Build docs developers (and LLMs) love

Introduction

The Dataset

Preprocessing Steps

Modeling & Evaluation

Documentation Index

​Target resolutions

​Resampling code

​Saving the outputs

​Why the minimum-count threshold matters

Build docs developers (and LLMs) love

Target resolutions

Resampling code

Saving the outputs

Why the minimum-count threshold matters