Before applying automated filters, exploratory data analysis (EDA) is the recommended first step. Plotting GHI against the Solar Zenith Angle reveals the physical envelope of the data and helps visually identify anomalies.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/maxiricalde/ProfeLedesma/llms.txt
Use this file to discover all available pages before exploring further.
GHI vs. Solar Zenith Angle scatter plot
Points that fall above the physical envelope or that appear at SZA > 90° are outliers:SZA is available directly on df after the Geo merge step. Any GHI value that lies above the expected physical curve, or shows up in the red cluster at an unexpected magnitude, is a candidate for removal.
Context matters
The workshop emphasizes that you should always understand what outliers represent in your domain before blindly removing them. A GHI spike might be:- A sensor fault (hardware malfunction or miscalibration)
- A cloud-edge effect (brief enhancement due to reflections off cloud edges)
- A sensor obstruction (shadow from a nearby object)
Domain-specific outlier validation
The principle of domain-aware validation applies across many data types:| Data type | Example validation | Reason |
|---|---|---|
| Time series | Detect abrupt jumps or physically impossible variations | Avoid false trends or extreme noise |
| Financial data | Filter negative prices or out-of-range daily swings | Guard against market/loading errors |
| IoT/sensor data | Remove impossible readings (temp < −100 °C) | Defective sensors or corrupt transmission |