A compelling in-sample backtest is not enough to trust a strategy with real capital. Overfitting — fitting the model’s parameters (or the rule tree’s thresholds) to the noise in historical data rather than genuine signal — is the single largest failure mode in quantitative research. The Validation Engine enforces a two-stage statistical gate between the Research Layer and the Portfolio Construction Layer: Walk-Forward Analysis measures out-of-sample performance stability across rolling time windows, and Combinatorial Purged Cross-Validation (CPCV) provides a full distribution of OOS paths and computes the Probability of Backtest Overfitting. A strategy must pass configured thresholds on both before its status can move toDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/najmulhossainnj/Hedge-fund-backend/llms.txt
Use this file to discover all available pages before exploring further.
validated and then promoted.
Why In-Sample Backtests Aren’t Enough
Overfitting
A model or rule tree optimised on a fixed historical window can memorise idiosyncratic patterns that will not repeat. In-sample Sharpe ratios routinely overstate live performance by 2–5×.
Look-Ahead Bias
Even without intentional peeking, overlapping forward-return labels and serial correlation between adjacent bars can leak test information into training — inflating CV scores without a purging step.
Walk-Forward Analysis
Walk-Forward Analysis simulates how a strategy would have been deployed and periodically retrained in a live environment. Instead of training once on all history, the engine creates multiple sequential folds and measures performance only on each fold’s out-of-sample window.Window Types
Rolling
Fixed-size train window slides forward. Each fold trains on the same number of bars. Prevents over-representation of early market regimes.
Expanding
Train window grows from the dataset start. Each fold uses all available history up to that point — mimicking the natural data accumulation of live operation.
Anchored
Identical to expanding but the start anchor is fixed explicitly. Useful when you want to control the exact lookback origin regardless of dataset length.
Walk-Forward Configuration
WalkForwardConfig parameters:
| Parameter | Default | Description |
|---|---|---|
method | "rolling" | Window type: rolling, expanding, or anchored |
n_splits | 5 | Number of folds to generate |
test_size | 0.2 | Fraction of total dataset length per test fold |
min_train_size | 0.3 | Minimum train fraction (rolling/expanding only) |
gap_bars | 0 | Bars of gap between train end and test start to prevent leakage |
refit | true | Re-train the model from scratch on each fold’s train set |
Per-Fold Results
For each fold the engine records both in-sample (IS) CV metrics and out-of-sample (OOS) metrics:Walk-Forward Aggregate
The engine aggregates OOS metrics across all folds:is_sharpe_stable—trueifstd_oos_sharpe / |mean_oos_sharpe| < 0.5. A high coefficient of variation suggests regime-dependent performance that will be unreliable live.is_profitable_oos—trueifmean_oos_sharpe > 0. The strategy must have positive risk-adjusted OOS returns on average.overfitting_score— IS directional accuracy / |OOS Sharpe|. Values much greater than 1.0 indicate the model is fitting training noise.
Combinatorial Purged Cross-Validation (CPCV)
Walk-forward produces one OOS path — a single sequential sequence of test folds. CPCV, as described by Marcos Lopez de Prado in Advances in Financial Machine Learning (Chapter 12), generates allC(n, k) combinations of k test folds from n total folds, producing many distinct OOS paths. The distribution of Sharpe ratios across these paths enables two additional statistics:
- Probability of Backtest Overfitting (PBO) — the fraction of CPCV paths whose OOS Sharpe is below the median IS Sharpe. A PBO above 0.5 means more than half the paths underperformed in OOS.
- Deflated Sharpe Ratio (DSR) — the OOS Sharpe adjusted downward for the number of trials (paths) evaluated, analogous to the Bonferroni correction in hypothesis testing.
Why CPCV Catches What Walk-Forward Misses
CPCV applies two additional safeguards against label leakage that walk-forward does not enforce:Purging
When the target
y[t] is a target_horizon-bar forward return, label at bar t overlaps with bars t+1 … t+horizon-1. Any training sample whose label window touches the test window is purged (removed from training).Embargo
Bars immediately following the test window are embargoed from training to eliminate momentum/autocorrelation leakage. The embargo length is
embargo_pct × n_samples bars after each test fold end.CPCV Configuration
CPCVConfig parameters:
| Parameter | Default | Description |
|---|---|---|
n_splits | 6 | Total folds n. Number of CPCV paths = C(n, k) |
n_test_splits | 2 | Test folds per combination k. Must be < n_splits |
embargo_pct | 0.01 | Fraction of total samples to embargo after each test fold |
purge | true | Enable label-overlap purging |
target_horizon | 1 | Forward-return horizon used for purge calculation |
n_splits=6 and n_test_splits=2 the engine generates C(6,2) = 15 distinct train/test splits, producing 15 independent OOS performance paths.
Validation Gates
Both the Walk-Forward and CPCV engines enforce configurable promotion gates. All gates must pass for the strategy status to advance tovalidated.
Walk-Forward Gates
| Gate | Config Field | Default | Pass Condition |
|---|---|---|---|
| OOS Sharpe | min_sharpe | 0.3 | mean_oos_sharpe ≥ min_sharpe |
| OOS Max Drawdown | max_drawdown | 0.25 | mean_oos_max_drawdown ≤ max_drawdown |
| Profitable Folds | profitable_fold_ratio | 0.5 | n_profitable_folds / n_folds ≥ ratio |
| Overfitting Score | max_overfit_ratio | 3.0 | overfitting_score ≤ max_overfit_ratio |
CPCV Gates
| Gate | Config Field | Default | Pass Condition |
|---|---|---|---|
| Mean OOS Sharpe | min_sharpe | 0.2 | mean_oos_sharpe ≥ min_sharpe |
| PBO | max_pbo | 0.6 | pbo ≤ max_pbo |
| Deflated Sharpe | min_deflated_sharpe | 0.1 | deflated_sharpe ≥ min_deflated_sharpe |
ValidationResult
The response from both validation endpoints shares a common structure:passed is true, the engine automatically transitions Strategy.status to validated.
When passed is false, gate_results tells you exactly which thresholds were missed and by how much — enabling targeted remediation (e.g. relaxing signal thresholds, adding features, or increasing the training window).
MLflow Integration
Every validation run is logged to MLflow as a standalone experiment:- Parameters:
plugin_key,n_splits,test_size,gap_bars,refit, model hyperparameters - Metrics: all aggregate OOS metrics, per-gate pass/fail as floats (
1.0/0.0),overfitting_score,validation_passed - Artifacts: per-fold metrics as a JSON file
mlflow_run_id in the response links directly to the MLflow experiment UI for drill-down exploration.
Async Validation
Validation runs can be long for multi-year datasets with many folds. Passasync_mode: true to dispatch to a Celery worker:
task_id immediately. Poll GET /api/validation/tasks/{task_id} for status and the final ValidationResult on completion.