Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/maxiricalde/ProfeLedesma/llms.txt

Use this file to discover all available pages before exploring further.

What Is Systematic Bias?

Even after careful preprocessing, a GHI model may exhibit systematic bias — for example, consistently underestimating measured values (negative rMBE). This kind of offset is structural: the model is not wrong at random, it is wrong in the same direction every time. A simple linear regression bias correction of the form Y = a·X + b can significantly improve predictions without rebuilding the model from scratch.

Detecting Bias with rmbe

Compute rmbe on the training set to detect and quantify systematic bias before applying any correction.
  • Negative rMBE → the model systematically underestimates measured GHI.
  • Positive rMBE → the model systematically overestimates measured GHI.
import helpers.Metrics as ms

dfTrain = df[df.datetime.dt.year == 2020].dropna()
dfTest  = df[df.datetime.dt.year > 2020].dropna()

bias_train = ms.rmbe(dfTrain.ghi, dfTrain.GHImod)
print(f'Training rMBE before correction: {bias_train:.2f}%')

Fitting the Linear Correction

The correction model Y = a·X + b is fit directly with NumPy. X is the model output (GHImod) and Y is the measured GHI. The coefficients are estimated on the training set only — applying the correction to the same data used to fit it estimates in-sample performance; the real test comes on the held-out test set.
The workshop uses NumPy for the bias correction fit. The scikit-learn library is not imported in the workshop notebooks. numpy.polyfit (or equivalently scipy.stats.linregress) is sufficient for this single-predictor linear correction.
import numpy as np

# Fit linear correction on training set
a, b = np.polyfit(dfTrain['GHImod'].values, dfTrain['ghi'].values, 1)

# Apply to both sets
dfTrain['GHIcorr'] = a * dfTrain['GHImod'] + b
dfTest['GHIcorr']  = a * dfTest['GHImod']  + b

Evaluating the Improvement

Recompute metrics on both sets to measure the impact of the correction. The expected result is a bias close to zero and a reduction in RMSD on the test set.
# --- Training set ---
print('--- Training set ---')
print(f'rRMSD before: {ms.rrmsd(dfTrain.ghi, dfTrain.GHImod):.2f}%')
print(f'rRMSD after:  {ms.rrmsd(dfTrain.ghi, dfTrain.GHIcorr):.2f}%')
print(f'rMBE  before: {ms.rmbe(dfTrain.ghi, dfTrain.GHImod):.2f}%')
print(f'rMBE  after:  {ms.rmbe(dfTrain.ghi, dfTrain.GHIcorr):.2f}%')

# --- Test set ---
print('--- Test set ---')
print(f'rRMSD before: {ms.rrmsd(dfTest.ghi, dfTest.GHImod):.2f}%')
print(f'rRMSD after:  {ms.rrmsd(dfTest.ghi, dfTest.GHIcorr):.2f}%')
print(f'rMBE  before: {ms.rmbe(dfTest.ghi, dfTest.GHImod):.2f}%')
print(f'rMBE  after:  {ms.rmbe(dfTest.ghi, dfTest.GHIcorr):.2f}%')

Compare before/after metrics side by side to communicate results clearly:
MetricBeforeAfter
rMBE (%)valuevalue
rMAE (%)valuevalue
rRMSD (%)valuevalue
SS4valuevalue
Replace value with the numbers produced by your metric calls.
Linear bias correction is a post-processing step, not a substitute for a well-calibrated model. It corrects for a constant multiplicative/additive offset but cannot fix structural model errors such as systematic misrepresentation of cloud effects or incorrect aerosol assumptions. If those deeper problems exist, the corrected model will still underperform on data that differs significantly from the training period.

Build docs developers (and LLMs) love