Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/najmulhossainnj/Hedge-fund-backend/llms.txt

Use this file to discover all available pages before exploring further.

A compelling in-sample backtest is not enough to trust a strategy with real capital. Overfitting — fitting the model’s parameters (or the rule tree’s thresholds) to the noise in historical data rather than genuine signal — is the single largest failure mode in quantitative research. The Validation Engine enforces a two-stage statistical gate between the Research Layer and the Portfolio Construction Layer: Walk-Forward Analysis measures out-of-sample performance stability across rolling time windows, and Combinatorial Purged Cross-Validation (CPCV) provides a full distribution of OOS paths and computes the Probability of Backtest Overfitting. A strategy must pass configured thresholds on both before its status can move to validated and then promoted.

Why In-Sample Backtests Aren’t Enough

Overfitting

A model or rule tree optimised on a fixed historical window can memorise idiosyncratic patterns that will not repeat. In-sample Sharpe ratios routinely overstate live performance by 2–5×.

Look-Ahead Bias

Even without intentional peeking, overlapping forward-return labels and serial correlation between adjacent bars can leak test information into training — inflating CV scores without a purging step.
Any strategy with an in-sample Sharpe above 2.0 should be treated with scepticism until walk-forward OOS Sharpe is confirmed. High IS/OOS ratios (the overfitting_score) are the primary signal of data snooping.

Walk-Forward Analysis

Walk-Forward Analysis simulates how a strategy would have been deployed and periodically retrained in a live environment. Instead of training once on all history, the engine creates multiple sequential folds and measures performance only on each fold’s out-of-sample window.

Window Types

Rolling

Fixed-size train window slides forward. Each fold trains on the same number of bars. Prevents over-representation of early market regimes.
[====TRAIN====][TEST] →
    [====TRAIN====][TEST] →

Expanding

Train window grows from the dataset start. Each fold uses all available history up to that point — mimicking the natural data accumulation of live operation.
[=TRAIN=][TEST]
[==TRAIN==][TEST]
[===TRAIN===][TEST]

Anchored

Identical to expanding but the start anchor is fixed explicitly. Useful when you want to control the exact lookback origin regardless of dataset length.

Walk-Forward Configuration

// POST /api/validation/walk-forward
{
  "strategy_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "plugin_key": "ml.xgboost",
  "model_params": { "max_depth": 6, "n_estimators": 200, "learning_rate": 0.05 },
  "feature_ids": ["f1000000-0000-0000-0000-000000000001"],
  "symbol": "AAPL",
  "timeframe": "1d",
  "start_date": "2018-01-01T00:00:00Z",
  "end_date": "2024-01-01T00:00:00Z",
  "target_horizon": 1,
  "wf_config": {
    "method": "rolling",
    "n_splits": 5,
    "test_size": 0.2,
    "min_train_size": 0.3,
    "gap_bars": 5,
    "refit": true
  },
  "validation_config": {
    "min_sharpe": 0.3,
    "max_drawdown": 0.25,
    "profitable_fold_ratio": 0.5,
    "max_overfit_ratio": 3.0
  }
}
WalkForwardConfig parameters:
ParameterDefaultDescription
method"rolling"Window type: rolling, expanding, or anchored
n_splits5Number of folds to generate
test_size0.2Fraction of total dataset length per test fold
min_train_size0.3Minimum train fraction (rolling/expanding only)
gap_bars0Bars of gap between train end and test start to prevent leakage
refittrueRe-train the model from scratch on each fold’s train set

Per-Fold Results

For each fold the engine records both in-sample (IS) CV metrics and out-of-sample (OOS) metrics:
{
  "fold_idx": 2,
  "train_start": 0,
  "train_end": 756,
  "test_start": 756,
  "test_end": 1008,
  "is_metrics": {
    "mean_directional_accuracy": 0.58,
    "mean_mse": 0.0041
  },
  "oos_metrics": {
    "total_return": 0.087,
    "perf_cagr": 0.054,
    "perf_sharpe_ratio": 0.81,
    "perf_sortino_ratio": 1.12,
    "risk_max_drawdown": 0.091,
    "risk_volatility_annualised": 0.127
  }
}

Walk-Forward Aggregate

The engine aggregates OOS metrics across all folds:
{
  "mean_oos_sharpe": 0.74,
  "std_oos_sharpe": 0.22,
  "min_oos_sharpe": 0.41,
  "mean_oos_return": 0.064,
  "mean_oos_max_drawdown": 0.103,
  "worst_oos_drawdown": 0.184,
  "n_profitable_folds": 4,
  "n_folds": 5
}
Two stability flags are also computed:
  • is_sharpe_stabletrue if std_oos_sharpe / |mean_oos_sharpe| < 0.5. A high coefficient of variation suggests regime-dependent performance that will be unreliable live.
  • is_profitable_oostrue if mean_oos_sharpe > 0. The strategy must have positive risk-adjusted OOS returns on average.
  • overfitting_score — IS directional accuracy / |OOS Sharpe|. Values much greater than 1.0 indicate the model is fitting training noise.

Combinatorial Purged Cross-Validation (CPCV)

Walk-forward produces one OOS path — a single sequential sequence of test folds. CPCV, as described by Marcos Lopez de Prado in Advances in Financial Machine Learning (Chapter 12), generates all C(n, k) combinations of k test folds from n total folds, producing many distinct OOS paths. The distribution of Sharpe ratios across these paths enables two additional statistics:
  • Probability of Backtest Overfitting (PBO) — the fraction of CPCV paths whose OOS Sharpe is below the median IS Sharpe. A PBO above 0.5 means more than half the paths underperformed in OOS.
  • Deflated Sharpe Ratio (DSR) — the OOS Sharpe adjusted downward for the number of trials (paths) evaluated, analogous to the Bonferroni correction in hypothesis testing.

Why CPCV Catches What Walk-Forward Misses

CPCV applies two additional safeguards against label leakage that walk-forward does not enforce:

Purging

When the target y[t] is a target_horizon-bar forward return, label at bar t overlaps with bars t+1 … t+horizon-1. Any training sample whose label window touches the test window is purged (removed from training).

Embargo

Bars immediately following the test window are embargoed from training to eliminate momentum/autocorrelation leakage. The embargo length is embargo_pct × n_samples bars after each test fold end.

CPCV Configuration

// POST /api/validation/cpcv
{
  "strategy_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "plugin_key": "ml.xgboost",
  "model_params": { "max_depth": 6, "n_estimators": 200 },
  "feature_ids": ["f1000000-0000-0000-0000-000000000001"],
  "symbol": "AAPL",
  "timeframe": "1d",
  "start_date": "2018-01-01T00:00:00Z",
  "end_date": "2024-01-01T00:00:00Z",
  "cpcv_config": {
    "n_splits": 6,
    "n_test_splits": 2,
    "embargo_pct": 0.01,
    "purge": true,
    "target_horizon": 1
  },
  "max_pbo": 0.6,
  "min_deflated_sharpe": 0.1,
  "min_sharpe": 0.2
}
CPCVConfig parameters:
ParameterDefaultDescription
n_splits6Total folds n. Number of CPCV paths = C(n, k)
n_test_splits2Test folds per combination k. Must be < n_splits
embargo_pct0.01Fraction of total samples to embargo after each test fold
purgetrueEnable label-overlap purging
target_horizon1Forward-return horizon used for purge calculation
With n_splits=6 and n_test_splits=2 the engine generates C(6,2) = 15 distinct train/test splits, producing 15 independent OOS performance paths.

Validation Gates

Both the Walk-Forward and CPCV engines enforce configurable promotion gates. All gates must pass for the strategy status to advance to validated.

Walk-Forward Gates

GateConfig FieldDefaultPass Condition
OOS Sharpemin_sharpe0.3mean_oos_sharpe ≥ min_sharpe
OOS Max Drawdownmax_drawdown0.25mean_oos_max_drawdown ≤ max_drawdown
Profitable Foldsprofitable_fold_ratio0.5n_profitable_folds / n_folds ≥ ratio
Overfitting Scoremax_overfit_ratio3.0overfitting_score ≤ max_overfit_ratio

CPCV Gates

GateConfig FieldDefaultPass Condition
Mean OOS Sharpemin_sharpe0.2mean_oos_sharpe ≥ min_sharpe
PBOmax_pbo0.6pbo ≤ max_pbo
Deflated Sharpemin_deflated_sharpe0.1deflated_sharpe ≥ min_deflated_sharpe

ValidationResult

The response from both validation endpoints shares a common structure:
{
  "passed": true,
  "gate_results": {
    "oos_sharpe": {
      "passed": true,
      "value": 0.74,
      "threshold": 0.3
    },
    "oos_max_drawdown": {
      "passed": true,
      "value": 0.103,
      "threshold": 0.25
    },
    "profitable_folds": {
      "passed": true,
      "value": 0.8,
      "threshold": 0.5
    },
    "overfit_ratio": {
      "passed": true,
      "value": 1.82,
      "threshold": 3.0
    }
  },
  "aggregate": {
    "mean_oos_sharpe": 0.74,
    "std_oos_sharpe": 0.22,
    "n_profitable_folds": 4,
    "n_folds": 5,
    "overfitting_score": 1.82
  },
  "mlflow_run_id": "abc123def456",
  "fold_details": [ { "...": "per-fold IS and OOS metrics" } ]
}
When passed is true, the engine automatically transitions Strategy.status to validated. When passed is false, gate_results tells you exactly which thresholds were missed and by how much — enabling targeted remediation (e.g. relaxing signal thresholds, adding features, or increasing the training window).

MLflow Integration

Every validation run is logged to MLflow as a standalone experiment:
  • Parameters: plugin_key, n_splits, test_size, gap_bars, refit, model hyperparameters
  • Metrics: all aggregate OOS metrics, per-gate pass/fail as floats (1.0 / 0.0), overfitting_score, validation_passed
  • Artifacts: per-fold metrics as a JSON file
The mlflow_run_id in the response links directly to the MLflow experiment UI for drill-down exploration.

Async Validation

Validation runs can be long for multi-year datasets with many folds. Pass async_mode: true to dispatch to a Celery worker:
POST /api/validation/walk-forward?async=true
The endpoint returns a task_id immediately. Poll GET /api/validation/tasks/{task_id} for status and the final ValidationResult on completion.

API Reference

For the complete endpoint reference — including per-fold result retrieval, gate configuration overrides, and CPCV path export — see the Validation API.

Build docs developers (and LLMs) love