Strategy Validation: Walk-Forward and CPCV Analysis

The Validation API sits at the gate between research and production: a strategy must pass rigorous out-of-sample validation before it can be promoted to live trading. Two complementary methods are exposed. Walk-forward analysis retrains the model and re-runs the signal pipeline on successive out-of-sample windows (rolling, expanding, or anchored), measuring whether performance holds up across folds and whether in-sample fit is inflated relative to out-of-sample results. Combinatorial Purged Cross-Validation (CPCV) goes further by generating all C(n, k) combinations of test folds, applying label-overlap purging and serial-correlation embargo to eliminate data leakage, and producing a distribution of OOS paths rather than a single estimate — enabling rigorous overfitting detection via the Probability of Backtest Overfitting (PBO) and the Deflated Sharpe Ratio. Every run is automatically logged to MLflow and linked to an Experiment row so the full history is retrievable by strategy.

Walk-Forward Analysis

Run Walk-Forward Validation

POST /api/v1/validation/walk-forward Assembles the feature/target dataset for the specified strategy, splits it into train and test folds using the configured window method, trains the model plugin on each fold’s training window, generates OOS predictions, constructs a synthetic equity curve from the prediction sign, and computes out-of-sample metrics for every fold. The engine then evaluates four promotion gates and marks the strategy as validated in the database if all gates pass. The full result is logged to MLflow.

Returns 422 Unprocessable Entity if the dataset assembled from feature_ids, symbol, and the date range is too small to generate even one valid walk-forward split. Increase the date range or reduce n_splits / test_size.

Request Body

strategy_id

uuid

required

UUID of the strategy being validated. Must exist in the database.

plugin_key

string

required

Key of the model plugin to use (e.g. "gradient_boosting", "logistic_regression"). The key must be registered in the model plugin registry.

model_params

dict

default:"{}"

Hyperparameter dict passed directly to the plugin constructor. Use the same params you would pass to BacktestRunConfig.signal_plugin_params for the corresponding signal.

feature_ids

list[uuid]

required

List of feature UUIDs to include as model inputs. The dataset assembler fetches and aligns all listed features before splitting.

symbol

string

required

Ticker or instrument identifier used for market data alignment (e.g. "AAPL", "BTC-USD").

timeframe

string

default:"1d"

Bar size. Must match the resolution of the features (e.g. "1d", "1h").

start_date

datetime

required

Inclusive start of the dataset window. ISO-8601 format: "2018-01-01T00:00:00".

end_date

datetime

required

Exclusive end of the dataset window. ISO-8601 format: "2024-01-01T00:00:00".

target_horizon

int

default:"1"

Forward-return horizon in bars. A value of 1 means the target is the next bar’s return. Larger values produce multi-period labels.

bars_per_year

int

default:"252"

Used for annualising Sharpe and other metrics. 252 for daily, 52 for weekly, 12 for monthly.

config

ValidationConfigSchema

Optional validation and walk-forward configuration. All nested fields have defaults.

Show ValidationConfigSchema fields

WalkForwardConfigSchema

Walk-forward splitting configuration.

Show WalkForwardConfigSchema fields

method

string

default:"rolling"

Window method. One of:

"rolling" — fixed-size training window slides forward each fold.
"expanding" — training window grows from the dataset start.
"anchored" — same as expanding but the anchor is explicitly t=0.

n_splits

int

default:"5"

Number of walk-forward folds to generate.

test_size

float

default:"0.2"

Fraction of the total dataset length dedicated to each test fold. For example, 0.2 on a 500-bar dataset gives 100-bar test windows.

min_train_size

float

default:"0.3"

Minimum fraction of the total dataset required in each training window. Folds that cannot satisfy this are skipped.

gap_bars

int

default:"0"

Number of bars to skip between the training end and test start. Useful for preventing leakage from short-term autocorrelation.

refit

bool

default:"true"

When true, the model is retrained on each fold’s training window. When false, the model trained on the first fold is reused on all subsequent folds.

min_sharpe

float

default:"0.3"

Promotion gate: mean OOS Sharpe across all folds must be ≥ this value.

max_drawdown

float

default:"0.25"

Promotion gate: mean OOS maximum drawdown across all folds must be ≤ this value (as a positive fraction, e.g. 0.25 = 25%).

profitable_fold_ratio

float

default:"0.5"

Promotion gate: fraction of folds with positive OOS total return must be ≥ this value. 0.5 means at least half the folds must be profitable.

max_overfit_ratio

float

default:"3.0"

Promotion gate: the overfitting score (IS Sharpe proxy / OOS Sharpe) must be ≤ this value. Values much greater than 1 indicate the strategy performs substantially better in-sample than out-of-sample.

Response — `WalkForwardResponse`

strategy_id

uuid

UUID of the validated strategy.

passed

bool

true if all four promotion gates passed. The strategy’s database status is set to "validated" when this is true.

gate_results

dict

Map of gate name to result object. Keys: "oos_sharpe", "oos_max_drawdown", "profitable_folds", "overfit_ratio".

Show Gate result object fields

passed

bool

Whether this individual gate passed.

value

float

The observed value for this gate (e.g. mean OOS Sharpe).

threshold

float

The threshold configured for this gate.

aggregate

dict

Aggregate OOS statistics across all folds. Common keys:

Key	Description
`mean_oos_sharpe`	Mean Sharpe ratio across OOS folds
`std_oos_sharpe`	Standard deviation of Sharpe across folds
`min_oos_sharpe`	Worst single-fold Sharpe
`mean_oos_return`	Mean total return across OOS folds
`std_oos_return`	Standard deviation of returns across folds
`mean_oos_max_drawdown`	Mean maximum drawdown across OOS folds
`worst_oos_drawdown`	Worst single-fold drawdown
`n_profitable_folds`	Number of folds with positive total return
`n_folds`	Total number of folds evaluated

folds

list[WalkForwardFoldSchema]

Per-fold detail. One entry per fold that generated valid OOS metrics.

Show WalkForwardFoldSchema fields

fold_idx

int

Zero-based fold index.

train_start

int

Row index of the training window start (inclusive).

train_end

int

Row index of the training window end (exclusive).

test_start

int

Row index of the test window start (inclusive).

test_end

int

Row index of the test window end (exclusive).

oos_metrics

dict

Full flat metrics dict from compute_metrics on the OOS equity curve. Keys follow the same perf_*, risk_*, trade_* prefixing as backtest metrics.

is_metrics

dict

In-sample cross-validation summary metrics from training on this fold’s training window (e.g. mean_directional_accuracy).

is_sharpe_stable

bool

true when the coefficient of variation of OOS Sharpe ratios across folds is below 0.5 (std / |mean| < 0.5). Unstable Sharpe suggests the strategy is sensitive to the chosen evaluation window.

is_profitable_oos

bool

true when the mean OOS Sharpe is positive.

overfitting_score

float

Ratio of mean IS directional accuracy to mean OOS Sharpe. Values near 1 indicate balanced in/out-of-sample performance. Values substantially above 1 indicate overfitting.

mlflow_run_id

string | null

ID of the MLflow run where all fold metrics and aggregate statistics were logged. Use this to navigate directly to the run in the MLflow UI.

curl -X POST https://api.example.com/api/v1/validation/walk-forward \
  -H "Content-Type: application/json" \
  -d '{
    "strategy_id": "a1b2c3d4-0000-0000-0000-000000000001",
    "plugin_key": "gradient_boosting",
    "model_params": {"n_estimators": 200, "max_depth": 4, "learning_rate": 0.05},
    "feature_ids": [
      "f1000000-0000-0000-0000-000000000001",
      "f1000000-0000-0000-0000-000000000002"
    ],
    "symbol": "AAPL",
    "timeframe": "1d",
    "start_date": "2018-01-01T00:00:00",
    "end_date": "2024-01-01T00:00:00",
    "target_horizon": 1,
    "bars_per_year": 252,
    "config": {
      "wf": {
        "method": "rolling",
        "n_splits": 5,
        "test_size": 0.2,
        "min_train_size": 0.3,
        "gap_bars": 5,
        "refit": true
      },
      "min_sharpe": 0.4,
      "max_drawdown": 0.20,
      "profitable_fold_ratio": 0.6,
      "max_overfit_ratio": 2.5
    }
  }'

{
  "strategy_id": "a1b2c3d4-0000-0000-0000-000000000001",
  "passed": true,
  "gate_results": {
    "oos_sharpe":       {"passed": true,  "value": 0.82, "threshold": 0.40},
    "oos_max_drawdown": {"passed": true,  "value": 0.14, "threshold": 0.20},
    "profitable_folds": {"passed": true,  "value": 0.80, "threshold": 0.60},
    "overfit_ratio":    {"passed": true,  "value": 1.43, "threshold": 2.50}
  },
  "aggregate": {
    "mean_oos_sharpe": 0.82,
    "std_oos_sharpe": 0.21,
    "min_oos_sharpe": 0.48,
    "mean_oos_return": 0.112,
    "mean_oos_max_drawdown": 0.14,
    "n_profitable_folds": 4,
    "n_folds": 5
  },
  "folds": [
    {
      "fold_idx": 0,
      "train_start": 0,   "train_end": 302,
      "test_start": 302,  "test_end": 453,
      "oos_metrics": {"perf_sharpe_ratio": 0.91, "total_return": 0.13, "risk_max_drawdown": 0.11},
      "is_metrics":  {"mean_directional_accuracy": 0.63}
    }
  ],
  "is_sharpe_stable": true,
  "is_profitable_oos": true,
  "overfitting_score": 1.43,
  "mlflow_run_id": "9f3b2a1c0d4e5f67890abcdef1234567"
}

Dispatch Walk-Forward Async

POST /api/v1/validation/walk-forward/async Accepts the same WalkForwardRequest body as the synchronous endpoint but immediately dispatches it as a Celery task and returns a task_id. Use this when the dataset is large or n_splits is high enough that synchronous execution would time out. Request Body — identical to POST /api/v1/validation/walk-forward. Response

{
  "task_id": "b3c4d5e6-0000-0000-0000-000000000088",
  "status": "PENDING"
}

Poll GET /api/v1/tasks/{task_id} to monitor progress. When the task state transitions to SUCCESS, retrieve the full WalkForwardResponse from the task result payload.

Combinatorial Purged Cross-Validation (CPCV)

Probability of Backtest Overfitting (PBO) — introduced by Lopez de Prado in Advances in Financial Machine Learning (Ch. 14), PBO measures the fraction of OOS performance paths that underperform the median in-sample performance. A PBO near 0 means most OOS paths outperform IS, which is the opposite of overfitting. A PBO near 1 means the strategy almost always looks worse OOS than IS — a strong signal of overfitted parameters. The default gate rejects strategies with PBO > 0.6.Deflated Sharpe Ratio (DSR) — the best observed Sharpe ratio across multiple trials is an upwardly biased estimate because of selection bias: we naturally pick the best result. The DSR adjusts the best OOS Sharpe for the number of paths evaluated and the skewness/kurtosis of the Sharpe distribution, producing a more conservative estimate. The default gate requires DSR ≥ 0.1.

Run CPCV Validation

POST /api/v1/validation/cpcv Runs Combinatorial Purged Cross-Validation on the assembled feature/target dataset. For each of the C(n_splits, n_test_splits) fold combinations, the engine trains the model plugin on the remaining folds (after purging and embargo), predicts on the test folds, and computes OOS metrics. The resulting distribution of OOS performance paths is used to compute PBO and the Deflated Sharpe Ratio. Three promotion gates are evaluated: minimum mean OOS Sharpe, maximum PBO, and minimum Deflated Sharpe.

Returns 422 Unprocessable Entity if the dataset is too small to generate splits of sufficient size (at least 10 training samples and 5 test samples per combination). Increase the date range or reduce n_splits.

Request Body

strategy_id

uuid

required

UUID of the strategy being validated.

plugin_key

string

required

Model plugin key (e.g. "gradient_boosting", "logistic_regression").

model_params

dict

default:"{}"

Hyperparameter dict passed to the plugin constructor.

feature_ids

list[uuid]

required

Feature UUIDs included as model inputs.

symbol

string

required

Ticker or instrument identifier for market data alignment.

timeframe

string

default:"1d"

Bar size matching the feature resolution.

start_date

datetime

required

Inclusive start of the dataset window. ISO-8601: "2018-01-01T00:00:00".

end_date

datetime

required

Exclusive end of the dataset window. ISO-8601: "2024-01-01T00:00:00".

target_horizon

int

default:"1"

Forward-return horizon in bars. Used by the purging step to remove training samples whose label window overlaps with the test window.

bars_per_year

int

default:"252"

Annualisation factor for Sharpe and other metrics.

cpcv

CPCVConfigSchema

CPCV splitting configuration.

Show CPCVConfigSchema fields

n_splits

int

default:"6"

Total number of folds (n in C(n, k)). More folds produce more paths but require a larger dataset and longer runtime.

n_test_splits

int

default:"2"

Number of folds held out as test in each combination (k in C(n, k)). Must be strictly less than n_splits. For n_splits=6, n_test_splits=2 this produces C(6, 2) = 15 paths.

embargo_pct

float

default:"0.01"

Fraction of total samples to embargo after each test fold. Removes training samples immediately following a test window to prevent serial-correlation leakage. 0.01 embargoes approximately 1% of the dataset (≈ 2.5 days per year for daily data).

purge

bool

default:"true"

When true, training samples whose forward-label window overlaps the test window are removed. Specifically, any training bar t where t + target_horizon > min(test_indices) is purged.

max_pbo

float

default:"0.6"

Promotion gate: computed PBO must be ≤ this value. Values above 0.6 indicate unacceptable overfitting probability.

min_deflated_sharpe

float

default:"0.1"

Promotion gate: Deflated Sharpe Ratio must be ≥ this value. Ensures the best observed OOS Sharpe survives selection-bias adjustment.

min_sharpe

float

default:"0.2"

Promotion gate: mean OOS Sharpe across all CPCV paths must be ≥ this value.

Response

strategy_id

string

UUID string of the validated strategy.

passed

bool

true if all three CPCV promotion gates passed. The strategy’s status is set to "validated" in the database when this is true.

gate_results

dict

Map of gate name to result. Keys: "mean_oos_sharpe", "pbo", "deflated_sharpe".

Show Gate result object fields

passed

bool

Whether this individual gate passed.

value

float

The observed value (e.g. computed PBO).

threshold

float

The configured gate threshold.

mlflow_run_id

string | null

ID of the MLflow run where fold path metrics and aggregate stats were logged.

config

dict

Echo of the CPCVConfig used: n_splits, n_test_splits, embargo_pct, purge, target_horizon.

n_paths

int

Number of OOS performance paths evaluated. Equal to C(n_splits, n_test_splits) minus any paths skipped due to insufficient data.

pbo

float

Probability of Backtest Overfitting (0.0–1.0). Fraction of OOS paths whose Sharpe underperforms the median IS Sharpe proxy. Lower is better.

deflated_sharpe

float

Deflated Sharpe Ratio. Best observed OOS Sharpe adjusted for selection bias across all paths. Higher is better; negative values indicate strong evidence of overfitting.

aggregate

dict

Aggregate statistics across all OOS paths.

Show Aggregate keys

Key	Description
`mean_oos_sharpe`	Mean Sharpe ratio across all OOS paths
`std_oos_sharpe`	Standard deviation of OOS Sharpe
`median_oos_sharpe`	Median OOS Sharpe
`min_oos_sharpe`	Worst single-path OOS Sharpe
`max_oos_sharpe`	Best single-path OOS Sharpe
`mean_oos_return`	Mean total return across paths
`std_oos_return`	Standard deviation of OOS returns
`mean_oos_drawdown`	Mean maximum drawdown across paths
`worst_oos_drawdown`	Worst single-path drawdown
`n_paths`	Number of paths evaluated
`pct_profitable_paths`	Fraction of paths with positive total return

paths

list[dict]

Per-path detail. One entry per C(n, k) combination evaluated.

Show Path fields

combination_idx

int

Zero-based combination index.

test_folds

list[int]

Indices of the folds used as the test set in this combination.

oos_sharpe

float

OOS Sharpe ratio for this path.

oos_return

float

OOS total return for this path.

oos_max_drawdown

float

OOS maximum drawdown for this path.

oos_directional_accuracy

float

Fraction of bars where the predicted direction matched the actual direction.

is_sharpe_proxy

float

In-sample directional accuracy used as the IS Sharpe proxy for PBO computation.

n_train

int

Number of training samples after purging and embargo.

n_test

int

Number of test samples in this path.

curl -X POST https://api.example.com/api/v1/validation/cpcv \
  -H "Content-Type: application/json" \
  -d '{
    "strategy_id": "a1b2c3d4-0000-0000-0000-000000000001",
    "plugin_key": "gradient_boosting",
    "model_params": {"n_estimators": 200, "max_depth": 4},
    "feature_ids": [
      "f1000000-0000-0000-0000-000000000001",
      "f1000000-0000-0000-0000-000000000002"
    ],
    "symbol": "AAPL",
    "timeframe": "1d",
    "start_date": "2016-01-01T00:00:00",
    "end_date": "2024-01-01T00:00:00",
    "target_horizon": 1,
    "bars_per_year": 252,
    "cpcv": {
      "n_splits": 6,
      "n_test_splits": 2,
      "embargo_pct": 0.01,
      "purge": true
    },
    "max_pbo": 0.5,
    "min_deflated_sharpe": 0.15,
    "min_sharpe": 0.25
  }'

{
  "strategy_id": "a1b2c3d4-0000-0000-0000-000000000001",
  "passed": true,
  "gate_results": {
    "mean_oos_sharpe":  {"passed": true, "value": 0.71, "threshold": 0.25},
    "pbo":             {"passed": true, "value": 0.33, "threshold": 0.50},
    "deflated_sharpe": {"passed": true, "value": 0.42, "threshold": 0.15}
  },
  "mlflow_run_id": "a4b5c6d7e8f9012345678abcdef09876",
  "config": {
    "n_splits": 6, "n_test_splits": 2,
    "embargo_pct": 0.01, "purge": true, "target_horizon": 1
  },
  "n_paths": 15,
  "pbo": 0.33,
  "deflated_sharpe": 0.42,
  "aggregate": {
    "mean_oos_sharpe":      0.71,
    "std_oos_sharpe":       0.18,
    "median_oos_sharpe":    0.69,
    "min_oos_sharpe":       0.37,
    "max_oos_sharpe":       1.04,
    "mean_oos_return":      0.093,
    "std_oos_return":       0.042,
    "mean_oos_drawdown":    0.11,
    "worst_oos_drawdown":   0.19,
    "n_paths":              15,
    "pct_profitable_paths": 0.87
  },
  "paths": [
    {
      "combination_idx": 0,
      "test_folds": [0, 1],
      "oos_sharpe": 0.91,
      "oos_return": 0.12,
      "oos_max_drawdown": 0.09,
      "oos_directional_accuracy": 0.58,
      "is_sharpe_proxy": 0.62,
      "n_train": 1256,
      "n_test": 402
    }
  ]
}

Dispatch CPCV Async

POST /api/v1/validation/cpcv/async Accepts the same CPCVRequest body as the synchronous CPCV endpoint but dispatches immediately as a Celery task. Use for large datasets where C(n_splits, n_test_splits) produces many paths and synchronous execution would exceed request timeouts. Request Body — identical to POST /api/v1/validation/cpcv. Response

{
  "task_id": "e5f6a7b8-0000-0000-0000-000000000055",
  "status": "PENDING"
}

Poll GET /api/v1/tasks/{task_id} for task state. On SUCCESS, the full CPCV response (including paths, pbo, deflated_sharpe, and gate_results) is available in the task result payload.

Validation History

Get Strategy Validation History

GET /api/v1/validation/strategies/{strategy_id} Returns every validation experiment row associated with a strategy, ordered by creation time. Each entry represents one walk-forward or CPCV run. Where an MLflow run ID is present, the entry is enriched with the run_type tag from MLflow (e.g. "rolling", "expanding", "cpcv").

Path Parameters

strategy_id

uuid

required

UUID of the strategy whose validation history to retrieve.

Response

strategy_id

string

UUID string of the strategy.

history

list[dict]

Ordered list of validation experiment records. Most recent runs appear last.

Show History entry fields

experiment_id

string

UUID of the Experiment row in the database.

mlflow_run_id

string | null

Linked MLflow run ID. Navigate to {MLFLOW_TRACKING_URI}/#/experiments?run_id={mlflow_run_id} to inspect fold metrics and artefacts.

parameters

dict

Parameters logged at the time of the run — includes plugin_key, n_splits, test_size, gap_bars, refit, and any model hyperparameters.

metrics

dict

Aggregate metrics logged for this run (e.g. mean_oos_sharpe, validation_passed, overfitting_score, pbo, deflated_sharpe).

created_at

string

ISO-8601 timestamp of when the experiment row was created.

run_type

string

MLflow run_type tag if available. Common values: "rolling", "expanding", "anchored", "cpcv". Falls back to "unknown" if the tag is absent or the MLflow run is not retrievable.

curl "https://api.example.com/api/v1/validation/strategies/a1b2c3d4-0000-0000-0000-000000000001"

{
  "strategy_id": "a1b2c3d4-0000-0000-0000-000000000001",
  "history": [
    {
      "experiment_id": "e1000000-0000-0000-0000-000000000001",
      "mlflow_run_id": "9f3b2a1c0d4e5f67890abcdef1234567",
      "parameters": {
        "plugin_key": "gradient_boosting",
        "n_splits": 5,
        "test_size": 0.2,
        "min_train_size": 0.3,
        "gap_bars": 0,
        "refit": true,
        "n_estimators": 200,
        "max_depth": 4
      },
      "metrics": {
        "mean_oos_sharpe": 0.82,
        "validation_passed": 1.0,
        "overfitting_score": 1.43
      },
      "created_at": "2024-03-15T09:12:44.000000",
      "run_type": "rolling"
    },
    {
      "experiment_id": "e2000000-0000-0000-0000-000000000002",
      "mlflow_run_id": "a4b5c6d7e8f9012345678abcdef09876",
      "parameters": {
        "plugin_key": "gradient_boosting",
        "n_splits": 6,
        "n_test_splits": 2,
        "embargo_pct": 0.01,
        "purge": true,
        "n_estimators": 200,
        "max_depth": 4
      },
      "metrics": {
        "mean_oos_sharpe": 0.71,
        "pbo": 0.33,
        "deflated_sharpe": 0.42,
        "validation_passed": 1.0
      },
      "created_at": "2024-03-15T11:04:22.000000",
      "run_type": "cpcv"
    }
  ]
}

Strategies

Features & Models

Backtesting & Validation

Intelligence & Tracking

Strategy Validation: Walk-Forward and CPCV Analysis

Walk-Forward Analysis

Run Walk-Forward Validation

Request Body

Response — `WalkForwardResponse`

Dispatch Walk-Forward Async

Combinatorial Purged Cross-Validation (CPCV)

Run CPCV Validation

Request Body

Response

Dispatch CPCV Async

Validation History

Get Strategy Validation History

Path Parameters

Response

Build docs developers (and LLMs) love

Strategies

Features & Models

Backtesting & Validation

Intelligence & Tracking

Documentation Index

​Walk-Forward Analysis

​Run Walk-Forward Validation

​Request Body

​Response — WalkForwardResponse

​Dispatch Walk-Forward Async

​Combinatorial Purged Cross-Validation (CPCV)

​Run CPCV Validation

​Request Body

​Response

​Dispatch CPCV Async

​Validation History

​Get Strategy Validation History

​Path Parameters

​Response

Build docs developers (and LLMs) love

Walk-Forward Analysis

Run Walk-Forward Validation

Request Body

Response — `WalkForwardResponse`

Dispatch Walk-Forward Async

Combinatorial Purged Cross-Validation (CPCV)

Run CPCV Validation

Request Body

Response

Dispatch CPCV Async

Validation History

Get Strategy Validation History

Path Parameters

Response