Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/maxiricalde/ProfeLedesma/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The helpers/Metrics.py module provides a set of statistical functions to evaluate the performance of GHI models against measured data. Metrics fall into two groups:
  • Point-comparison metrics — compare model output to observations value-by-value (MBE, MAE, RMSD and their relative variants).
  • Distribution-comparison metrics — compare the overall statistical distribution of modelled and observed values (KSI, OVER, SS4).

Point Comparison Metrics

mbe(true, pred) — Mean Bias Error

Formula: sum(pred - true) / N Measures the average systematic offset of predictions relative to observations.
  • Positive value → the model overestimates (positive bias).
  • Negative value → the model underestimates (negative bias).
import helpers.Metrics as ms

bias = ms.mbe(dfTrain.ghi, dfTrain.GHImod)
print(f'MBE = {bias:.2f} W/m²')

rmbe(true, pred) — Relative MBE (%)

Formula: mean(pred - true) / mean(true) × 100 Expresses the mean bias as a percentage of the mean of the observations. Useful for comparing bias across sites or models with different irradiance levels.
relative_bias = ms.rmbe(dfTrain.ghi, dfTrain.GHImod)
print(f'rMBE = {relative_bias:.2f}%')

mae(true, pred) — Mean Absolute Error

Formula: sum(|pred - true|) / N Average magnitude of errors regardless of direction. Unlike MBE, positive and negative errors do not cancel out.
error = ms.mae(dfTrain.ghi, dfTrain.GHImod)
print(f'MAE = {error:.2f} W/m²')

rmae(true, pred) — Relative MAE (%)

Formula: MAE / mean(true) × 100 Mean absolute error expressed as a percentage of the mean of the observations.
relative_error = ms.rmae(dfTrain.ghi, dfTrain.GHImod)
print(f'rMAE = {relative_error:.2f}%')

rmsd(true, pred) — Root Mean Square Deviation

Formula: sqrt( sum((pred - true)²) / N ) Penalises large errors more heavily than MAE. Sensitive to outliers and large individual deviations.
rmsd_val = ms.rmsd(dfTrain.ghi, dfTrain.GHImod)
print(f'RMSD = {rmsd_val:.2f} W/m²')

rrmsd(true, pred) — Relative RMSD (%)

Formula: RMSD / mean(true) × 100 RMSD expressed as a percentage of the mean of the observations.
rrmsd_val = ms.rrmsd(dfTrain.ghi, dfTrain.GHImod)
print(f'rRMSD = {rrmsd_val:.2f}%')

Distribution Comparison Metrics

ecdf(x) — Empirical Cumulative Distribution Function

Returns: (xs, ys) — sorted values and their cumulative probabilities. This is a building block used internally by KSI_OVER. Call it directly when you want to plot the CDF of a series.
xs, ys = ms.ecdf(dfTrain.ghi.values)
# xs: sorted GHI values
# ys: corresponding cumulative probabilities [0, 1]

KSI_OVER(Xval, Xest, CDF=0) — Kolmogorov-Smirnov Integral and OVER

Compares the full empirical CDF of observed values (Xval) against the CDF of model estimates (Xest). KSI — integral of |CDF_val - CDF_est| over the value range. Measures the total distributional difference between observations and model output. OVER — integral of the excess above the critical threshold Vc = 1.63 / sqrt(N). Only areas where the CDF difference exceeds the statistical significance threshold are counted. Relative metrics:
  • rKSI = KSI / (Vc × (Xmax − Xmin))
  • rOVER = OVER / (Vc × (Xmax − Xmin))
Return values by mode:
CDF argumentReturns
0 (default)KSI scalar only
1KSI, OVER, rKSI, rOVER, xCDF_tot, CDFval_tot, CDFest_tot, Dn, On, Vc
Use CDF=1 when you need the full CDF arrays for plotting.
# Default: scalar KSI
ksi = ms.KSI_OVER(dfTrain.ghi.values, dfTrain.GHImod.values)
print(f'KSI = {ksi:.4f}')

# Full output for plotting
ksi, over, rksi, rover, x, cdf_val, cdf_est, dn, on, vc = \
    ms.KSI_OVER(dfTrain.ghi.values, dfTrain.GHImod.values, CDF=1)

SS4(true, pred) — Skill Score

Based on the Beyer/Lorenz SS4 formula. Combines correlation and standard deviation ratio into a single dimensionless score. Formula:
SS4 = ((1 + ρ)⁴) / (4 × (σ_ratio + 1/σ_ratio)²)
Where:
  • ρ = Pearson correlation coefficient between true and pred
  • σ_ratio = σ_est / σ_med (standard deviation of estimates divided by standard deviation of measurements)
Range: 0 to 1. Higher is better. A perfect model scores 1.0.
ss = ms.SS4(dfTrain.ghi.values, dfTrain.GHImod.values)
print(f'SS4 = {ss:.4f}')

All metric functions expect numpy arrays or pandas Series. Passing a DataFrame directly will raise an error. Extract the column with .values (or use the Series directly) before calling any metric function.
# ✅ Correct — pandas Series
ms.rmbe(dfTrain.ghi, dfTrain.GHImod)

# ✅ Correct — numpy arrays
ms.SS4(dfTrain.ghi.values, dfTrain.GHImod.values)

# ❌ Incorrect — DataFrame
ms.rmbe(dfTrain[['ghi']], dfTrain[['GHImod']])

Build docs developers (and LLMs) love