Grid search isDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/tripolskypetr/pump-anomaly/llms.txt
Use this file to discover all available pages before exploring further.
argmax over thousands of CV scores. Even when the true edge is exactly zero, the maximum of N noisy estimates is biased upward by approximately σ · √(2 · ln N). The one-standard-error rule (see Training) softens this, but it does not prove the surviving edge is real. The statistical certificate does. It is an independent judge applied to the already-selected configuration — it is never used as an input to selection, because using it to pick configs would make it overfittable and defeat the point.
The Five Barriers
All five barriers must pass simultaneously.certified: true requires the edge to survive every one of them. The literature sources are López de Prado (DSR 2014, PBO 2015, minTRL), White (Reality Check 2000), and Hansen (SPA 2005).
| Barrier | Function | Catches | Threshold |
|---|---|---|---|
| DSR (Deflated Sharpe) | deflatedSharpe | Edge doesn’t survive the correction for N trials + skew / kurtosis / length | ≥ 0.95 |
| PBO (CSCV overfit) | probabilityOfBacktestOverfitting | The IS-best configuration is systematically poor OOS | ≤ 0.10 |
| SPA / Reality Check | realityCheckPValue | The whole edge is explainable by data-snooping (stationary bootstrap) | p ≤ 0.05 |
| minTRL | minTrackRecordLength | The sample is physically too small for significance | N ≥ minTRL |
| Nested OOS | (from train) | The unbiased out-of-sample forecast is not positive | > 0 |
DSR — Deflated Sharpe Ratio
DSR — Deflated Sharpe Ratio
DSR asks: given that we searched N configurations and observed the selected strategy’s Sharpe
SR, what is the probability that the true Sharpe exceeds the expected maximum Sharpe from random search at this sample size?SR₀ = expectedMaxSharpe(varSR, N) is the “luck bar” — how high a Sharpe you expect to see by chance from N independent trials with variance varSR. The denominator corrects for fat tails (high kurtosis) and asymmetry (skewness) in the return distribution. DSR ≥ 0.95 means there is a ≥ 95% probability the edge is real after accounting for the search.PBO — Probability of Backtest Overfitting (CSCV)
PBO — Probability of Backtest Overfitting (CSCV)
PBO uses Combinatorially-Symmetric Cross-Validation. The performance matrix
perf[config][fold] is split into all C(S, S/2) IS/OOS combinations. On each combination the best IS config is identified; its OOS rank is recorded. PBO is the fraction of splits where the IS-best config falls below median OOS performance (logit rank < 0). PBO ≤ 0.10 means the selected configuration generalizes — the IS winner is not systematically a fluke.SPA / Reality Check — Stationary Bootstrap
SPA / Reality Check — Stationary Bootstrap
White’s Reality Check and Hansen’s SPA test the null hypothesis “the best of N strategies is no better than a zero benchmark.” The test statistic is
V = max_k √T · mean(returns_k). Under H₀, centered returns are bootstrap-resampled using Politis-Romano stationary blocks (preserving autocorrelation). The p-value is the fraction of bootstrap V values that exceed the observed V. p ≤ 0.05 means the edge cannot be explained purely by data-snooping.minTRL — Minimum Track Record Length
minTRL — Minimum Track Record Length
minTRL is the minimum number of trades needed for the observed Sharpe to be statistically significant at α = 0.05, corrected for skewness and kurtosis:actualN < minTRL, the sample is physically too small; any conclusions are premature.Reading the Certification Object
certified: false, reasons is populated with human-readable explanations for each failing barrier, for example:
certified: false is an Honest Refusal
Training still ran. The grid argmax still picked a winner. But the certificate says that winner is a brute-force artifact, not a real edge. The e2e test fit-noise-rejection demonstrates this property: a full fit on a pure random walk does learn a “best” configuration with a positive CV score, and the certificate correctly returns certified: false.
This is the layer reliable cannot provide. reliable: true only means there were enough stable, significant trades in the dataset. It does not see the winner’s curse of the search itself. A dataset with 200 stable trades from a genuinely random price process will pass reliable and fail certified.
reliable vs certified
These two properties answer different questions and both are required:
reliable: true
Data quality — enough trades, the edge was stable across folds, and it was statistically distinguishable from zero within the dataset. Tells you the training data was sound.
certified: true
Edge reality — the selected configuration survives all five barriers against winner’s curse. Tells you the edge is not a brute-force search artifact.
reliable: false, certified: false— thin data, and the found edge is an artifact. Do not trade.reliable: true, certified: false— solid data volume and stability, but the grid search inflated the result. Do not trade.reliable: false, certified: true— rare in practice; edge survived statistical tests but data is thin. Trade cautiously, checkminTRL.reliable: true, certified: true— data is solid and the edge is real. Safe to trade.
Overriding Thresholds
The default thresholds (DSR ≥ 0.95, PBO ≤ 0.10, SPA p ≤ 0.05) are from the literature. You can override them by callingcertifyStrategy directly:
deflatedSharpe, probabilityOfBacktestOverfitting, realityCheckPValue, minTrackRecordLength, expectedMaxSharpe. Moment statistics (mean, variance, skewness, kurtosis), normal distribution utilities (normalCdf, normalInv), and the bootstrap primitive (stationaryBootstrapResample, mulberry32) are exported too.
certified alone is blind to repeated fit() calls. Running fit 720 times over a month and trading only when certified: true is itself a search over 720 trials — each certified run can be the outlier among those 720 attempts. A single-fit certificate cannot see this chain. The Meta-Ledger guide explains how to guard against this meta-level overfitting.