Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/najmulhossainnj/Hedge-fund-backend/llms.txt

Use this file to discover all available pages before exploring further.

The Models API manages ML model definitions and exposes the full training pipeline. A model definition records the model_type, family, and parameters; the training endpoints drive time-series cross-validation using rolling or expanding window splits, Optuna hyperparameter search across configurable trial budgets, and multi-candidate AutoML evaluation that produces a ranked leaderboard — all backed by S3/MinIO artifact storage and MLflow experiment tracking via mlflow_run_id. Model families span statistical (arima, garch), machine learning (xgboost, lightgbm, catboost, random_forest), deep learning (lstm), and ensemble methods.

Model CRUD

Create a Model Definition

POST
string
/api/v1/models
Registers a new model definition. The model is not trained at this point — use the training endpoints to fit it against a feature dataset.

Request Body

name
string
required
Human-readable name, e.g. "XGBoost v1". Maximum 255 characters.
model_type
string
required
Algorithm identifier, e.g. "xgboost", "lstm", "arima". This is the sub-type within the family.
family
string
required
Model family. One of: "statistical", "machine_learning", "deep_learning", "ensemble".
parameters
object
Initial hyperparameters passed to the model plugin constructor, e.g. {"n_estimators": 100, "max_depth": 5}. Defaults to {}.

Response — ModelRead

id
string (UUID)
Unique model identifier.
name
string
Model name.
model_type
string
Algorithm sub-type.
family
string
Model family.
parameters
object
Stored hyperparameters.
version
integer
Optimistic-lock version counter, incremented on each update.
mlflow_run_id
string | null
MLflow run ID linked after training. null before first train.
artifact_uri
string | null
S3 URI to the serialized model artifact. null before first train.
metrics
object
Latest CV metric summary dict stored after training. Empty before first train.
created_at
string (datetime)
ISO 8601 creation timestamp.
updated_at
string (datetime)
ISO 8601 last-updated timestamp.
curl -X POST http://localhost:8000/api/v1/models \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "XGBoost v1",
    "model_type": "xgboost",
    "family": "machine_learning",
    "parameters": {"n_estimators": 100, "max_depth": 5}
  }'

List Model Definitions

GET
string
/api/v1/models
Returns a paginated list of all model definitions.

Query Parameters

skip
integer
Number of records to skip. Default 0.
limit
integer
Maximum records to return. Default 100.
Response: Array of ModelRead objects.
curl "http://localhost:8000/api/v1/models?skip=0&limit=50"

Get a Model Definition

GET
string
/api/v1/models/{model_id}
Fetches a single model definition by UUID.
model_id
string (UUID)
required
UUID of the model to retrieve.
Response: ModelRead object. Errors: 404 Model not found.
curl http://localhost:8000/api/v1/models/7c9e6679-7425-40de-944b-e07fc1f90ae7

Update a Model Definition

PATCH
string
/api/v1/models/{model_id}
Partially updates a model definition. All fields are optional; only supplied fields are changed.

Request Body (all optional)

name
string
Updated name.
model_type
string
Updated algorithm type.
family
string
Updated model family.
parameters
object
Replacement hyperparameter dict (full replace, not merge).
mlflow_run_id
string
Link or update the MLflow run ID.
artifact_uri
string
Override the S3 artifact URI.
metrics
object
Override the stored metrics dict.
Response: Updated ModelRead object. Errors: 404 Model not found.

Delete a Model Definition

DELETE
string
/api/v1/models/{model_id}
Permanently removes a model definition from the database.
Deleting a model definition does not remove the MLflow run or the S3 artifact. Remove those separately through MLflow or your object-store management tooling if you need to reclaim storage.
Response: 204 No Content Errors: 404 Model not found.
curl -X DELETE http://localhost:8000/api/v1/models/7c9e6679-7425-40de-944b-e07fc1f90ae7

Training

Train with Time-Series Cross-Validation

POST
string
/api/v1/models/{model_id}/train
Assembles training data from one or more FeatureDataset records, aligns them to a forward-return target at target_horizon bars, runs time-series cross-validation, serializes the fitted model to S3, and records the MLflow run. The call is synchronous and blocks until training completes.

Path Parameter

model_id
string (UUID)
required
UUID of the model definition to train.

Request Body — TrainRequest

dataset
DatasetSpec
required
cv
CVConfig

Response — TrainResponse

model_id
string (UUID)
UUID of the trained model (same as path parameter).
artifact_uri
string | null
S3 path to the serialized model artifact written after training.
cv_metrics
object
Fold-level and aggregate cross-validation metrics, including per-fold MSE, MAE, and directional accuracy, plus summary statistics (mean, std).
n_train_rows
integer
Total number of aligned rows in the assembled training frame (after feature warm-up period and target shift).
feature_columns
array of strings
Ordered list of column names in the feature matrix X, useful for reproducing the exact input schema.
Errors:
  • 404 Model not found or 404 Unknown feature id(s) — if the model or any feature ID is missing.
  • 422 Unprocessable Entity — if fewer than 30 aligned rows remain after feature warm-up and target alignment.
curl -X POST http://localhost:8000/api/v1/models/MODEL_UUID/train \
  -H 'Content-Type: application/json' \
  -d '{
    "dataset": {
      "feature_ids": ["FEATURE_UUID_1", "FEATURE_UUID_2"],
      "symbol": "AAPL",
      "timeframe": "1d",
      "start_date": "2021-01-01T00:00:00",
      "end_date": "2024-01-01T00:00:00",
      "target_horizon": 1
    },
    "cv": {
      "method": "rolling",
      "n_splits": 5,
      "test_size": 0.15,
      "min_train_size": 0.2
    }
  }'

Train Asynchronously

POST
string
/api/v1/models/{model_id}/train/async
Dispatches the same training job as a Celery background task and returns immediately. Poll the task status via GET /api/v1/tasks/{task_id}. Request Body: Identical to POST /api/v1/models/{model_id}/train. Response:
{
  "task_id": "8c78e9d3-cf2e-4a65-a9ea-19a456c2abfe",
  "status": "PENDING"
}
Errors: 404 Model not found.

Hyperparameter Tuning

Run an Optuna Tuning Study

POST
string
/api/v1/models/tune
Runs a synchronous Optuna hyperparameter search over the specified param_space. Each trial trains the plugin with a sampled configuration and evaluates it under the same CV protocol as the training endpoint. Returns the best parameters found within n_trials.

Request Body — TuneRequest

plugin_key
string
required
Plugin to tune, e.g. "ml.xgboost". Does not need to be associated with an existing model definition.
dataset
DatasetSpec
required
Same DatasetSpec structure as the training endpoint.
param_space
object
required
Dictionary mapping parameter names to ParamSpec objects.
n_trials
integer
Number of Optuna trials to run. Default 30.
cv
CVConfig
Cross-validation configuration (same structure as training). Default: rolling, 5 splits, test_size 0.15.
direction
string
Optimization direction: "minimize" or "maximize". Default "minimize".
metric
string
Metric to optimize. Common values: "mean_mse", "mean_mae", "directional_accuracy". Default "mean_mse".

Response — TuneResponse

best_params
object
Hyperparameter dict from the best-scoring trial.
best_score
float
Metric value achieved by the best trial.
n_trials
integer
Number of trials actually completed (may be less than requested if early stopping occurs).
Example param_space payload for XGBoost:
{
  "n_estimators": {"type": "int", "low": 50, "high": 500},
  "max_depth": {"type": "int", "low": 3, "high": 10},
  "learning_rate": {"type": "float", "low": 0.01, "high": 0.3, "log": true},
  "subsample": {"type": "float", "low": 0.5, "high": 1.0},
  "colsample_bytree": {"type": "float", "low": 0.5, "high": 1.0}
}
curl -X POST http://localhost:8000/api/v1/models/tune \
  -H 'Content-Type: application/json' \
  -d '{
    "plugin_key": "ml.xgboost",
    "dataset": {
      "feature_ids": ["FEATURE_UUID"],
      "symbol": "AAPL",
      "timeframe": "1d",
      "start_date": "2021-01-01T00:00:00",
      "end_date": "2024-01-01T00:00:00",
      "target_horizon": 1
    },
    "param_space": {
      "n_estimators": {"type": "int", "low": 50, "high": 500},
      "max_depth": {"type": "int", "low": 3, "high": 10},
      "learning_rate": {"type": "float", "low": 0.01, "high": 0.3, "log": true}
    },
    "n_trials": 50,
    "cv": {"method": "rolling", "n_splits": 5, "test_size": 0.15, "min_train_size": 0.2},
    "direction": "minimize",
    "metric": "mean_mse"
  }'

Run Tuning Asynchronously

POST
string
/api/v1/models/tune/async
Dispatches the Optuna study as a Celery background task. Returns immediately with a task_id. Request Body: Identical to POST /api/v1/models/tune. Response:
{
  "task_id": "3f91c8b2-12de-4a3c-b55f-7e0291d3b4f1",
  "status": "PENDING"
}

AutoML

Run AutoML Leaderboard

POST
string
/api/v1/models/automl
Evaluates a set of candidate plugin configurations under the same CV protocol and ranks them by the chosen metric. This is the fastest path to identifying the best algorithm family and coarse hyperparameter settings before fine-tuning with Optuna.

Request Body — AutoMLRequest

dataset
DatasetSpec
required
Same DatasetSpec structure as the training endpoint.
candidates
object
required
Mapping of plugin_key → fixed params dict. Each key–value pair is one candidate to evaluate, e.g. {"ml.xgboost": {"max_depth": 6}, "ml.lightgbm": {"num_leaves": 64}}.
cv
CVConfig
Cross-validation configuration shared across all candidates.
metric
string
Ranking metric. Default "mean_mse".

Response — AutoMLResponse

leaderboard
array of AutoMLCandidateResponse
The leaderboard is sorted by score according to the chosen metric direction (ascending for error metrics).
curl -X POST http://localhost:8000/api/v1/models/automl \
  -H 'Content-Type: application/json' \
  -d '{
    "dataset": {
      "feature_ids": ["FEATURE_UUID"],
      "symbol": "AAPL",
      "timeframe": "1d",
      "start_date": "2022-01-01T00:00:00",
      "end_date": "2024-01-01T00:00:00",
      "target_horizon": 1
    },
    "candidates": {
      "ml.xgboost": {"n_estimators": 200, "max_depth": 6},
      "ml.lightgbm": {"n_estimators": 200, "num_leaves": 64},
      "ml.catboost": {"iterations": 200, "depth": 6}
    },
    "cv": {
      "method": "rolling",
      "n_splits": 5,
      "test_size": 0.15,
      "min_train_size": 0.2
    },
    "metric": "mean_mse"
  }'
Example response:
{
  "leaderboard": [
    {
      "plugin_key": "ml.lightgbm",
      "params": {"n_estimators": 200, "num_leaves": 64},
      "score": 0.00187,
      "metrics": {"mean_mse": 0.00187, "mean_mae": 0.0312, "mean_directional_accuracy": 0.561}
    },
    {
      "plugin_key": "ml.xgboost",
      "params": {"n_estimators": 200, "max_depth": 6},
      "score": 0.00214,
      "metrics": {"mean_mse": 0.00214, "mean_mae": 0.0341, "mean_directional_accuracy": 0.548}
    },
    {
      "plugin_key": "ml.catboost",
      "params": {"iterations": 200, "depth": 6},
      "score": 0.00231,
      "metrics": {"mean_mse": 0.00231, "mean_mae": 0.0358, "mean_directional_accuracy": 0.537}
    }
  ]
}

Plugin Discovery

List Available Model Plugins

GET
string
/api/v1/models/plugins/available
Returns all plugin keys registered in the Model Plugin Registry. Response:
{
  "plugins": [
    "ml.xgboost",
    "ml.lightgbm",
    "ml.catboost",
    "ml.random_forest",
    "dl.lstm"
  ]
}
curl http://localhost:8000/api/v1/models/plugins/available

Get Default Hyperparameter Search Spaces

GET
string
/api/v1/models/plugins/search-spaces
Returns the default Optuna search spaces for each registered plugin, as defined in the platform’s search_spaces.py. The Model Builder UI reads these to pre-populate the tuning configuration form, ensuring the frontend and tuner always use the same bounds.
These are the platform-default search spaces. You can override any parameter range or add new parameters in your TuneRequest.param_space — the tuner uses your supplied spec in full.
Response:
{
  "ml.xgboost": {
    "max_depth":        {"type": "int",   "low": 3,     "high": 10},
    "learning_rate":    {"type": "float", "low": 0.005, "high": 0.3,  "log": true},
    "n_estimators":     {"type": "int",   "low": 100,   "high": 800},
    "subsample":        {"type": "float", "low": 0.5,   "high": 1.0},
    "colsample_bytree": {"type": "float", "low": 0.5,   "high": 1.0},
    "min_child_weight": {"type": "int",   "low": 1,     "high": 10}
  },
  "ml.lightgbm": {
    "num_leaves":         {"type": "int",   "low": 16,    "high": 256, "log": true},
    "learning_rate":      {"type": "float", "low": 0.005, "high": 0.3, "log": true},
    "n_estimators":       {"type": "int",   "low": 100,   "high": 800},
    "min_child_samples":  {"type": "int",   "low": 5,     "high": 100},
    "subsample":          {"type": "float", "low": 0.5,   "high": 1.0}
  },
  "ml.catboost": {
    "depth":         {"type": "int",   "low": 3,     "high": 10},
    "learning_rate": {"type": "float", "low": 0.005, "high": 0.3,  "log": true},
    "iterations":    {"type": "int",   "low": 100,   "high": 800},
    "l2_leaf_reg":   {"type": "float", "low": 1.0,   "high": 10.0, "log": true}
  },
  "ml.random_forest": {
    "n_estimators":      {"type": "int", "low": 100, "high": 600},
    "max_depth":         {"type": "int", "low": 3,   "high": 20},
    "min_samples_leaf":  {"type": "int", "low": 1,   "high": 20}
  },
  "dl.lstm": {
    "hidden_size": {"type": "categorical", "choices": ["32", "64", "128", "256"]},
    "num_layers":  {"type": "int",   "low": 1,    "high": 3},
    "dropout":     {"type": "float", "low": 0.0,  "high": 0.5},
    "lr":          {"type": "float", "low": 1e-4, "high": 1e-2, "log": true},
    "seq_len":     {"type": "int",   "low": 10,   "high": 60},
    "epochs":      {"type": "int",   "low": 10,   "high": 50}
  }
}
curl http://localhost:8000/api/v1/models/plugins/search-spaces

Build docs developers (and LLMs) love