Hedge Fund Backend: Quant Research Platform

The Hedge Fund Backend is a production-grade, asynchronous research platform built for quantitative researchers and algorithmic traders who need a disciplined, reproducible toolchain — from raw market data all the way to strategy promotion. It is designed as the Research Layer of a broader multi-service quant stack: a standalone FastAPI service that integrates with an upstream Market Data Layer, a downstream Portfolio Layer, and MLflow for experiment governance. Whether you are building signal libraries, training gradient-boosted ensembles, running walk-forward validation, or orchestrating an AI research agent, every workflow is exposed as a clean REST API backed by async SQLAlchemy, Celery workers, S3-native artifact storage, and a plugin registry that makes adding new models or engines a matter of dropping in a single module.

What’s Implemented

The platform has been built across a series of focused engineering phases, each adding a self-contained layer of capability on top of the ones before it. Phase 1 — Domain Models + CRUD + Plugin Architecture Defines the five core domain models (Strategy, Feature, MLModel, Experiment, Backtest) as async SQLAlchemy 2.0 ORM classes with UUID primary keys, timestamps, and versioning. Full REST CRUD for each resource is generated from a generic CRUDRepository to eliminate per-resource boilerplate. The plugin system (app/plugins/) exposes BaseFeature, BaseModel, BaseSignalGenerator, and BaseBacktestEngine abstract interfaces (all defined in app/plugins/base.py) plus a PluginRegistry (app/plugins/registry.py) — new compute plugins can be registered by adding a single module with no changes to core code. Phase 2 — Feature Engine + Feature Store Introduces FeatureDataset, a model that tracks every generated instance of a feature (per symbol, timeframe, and date range). A SHA-256 content hash over plugin key, params, and source fingerprint provides deterministic versioning: identical inputs reuse existing results, while changed source data triggers a new version without overwriting history. Computed features are stored as Parquet files in S3/MinIO, with metadata lineage in Postgres and a Redis cache fronting repeated reads within a session. Phase 3 — Model Training Engine Adds time-series cross-validation splitters (rolling and expanding window, strictly ordered with no leakage), a dataset assembler that joins Feature Store outputs into a training matrix, and an Optuna-based hyperparameter tuner. An AutoML mode ranks multiple model plugins on the same CV metric and returns a sorted leaderboard. Final fits persist model artifacts to S3/MinIO and update the model row in Postgres. Supported frameworks: XGBoost, LightGBM, CatBoost, Random Forest, and LSTM (PyTorch). Phase 5 — Backtest Engine (vectorbt + Backtrader) Provides two production-ready backtest adapters — VectorBTAdapter for fast vectorized simulation and BacktraderAdapter for event-driven simulation with commission and slippage. Both adapters produce a normalized BacktestResult container (equity curve, trade list, drawdown series) and compute an identical set of metrics: CAGR, Sharpe, Sortino, Calmar, Max Drawdown, VaR/CVaR, Win Rate, Profit Factor, Expectancy, and Turnover. A parameter sweep endpoint creates and executes N backtest rows in the Celery worker pool and returns a ranked leaderboard. Phase 6 — MLflow Experiment Tracking Integrates MLflow for run metadata, metrics, parameters, and tags across training, tuning, and backtest experiments. Artifacts are stored to s3://research-artifacts/mlflow so that every experiment run is fully reproducible and auditable from the MLflow UI. Phase 7/8 — Walk-Forward + CPCV Validation Implements purged, embargoed Combinatorial Purged Cross-Validation (CPCV) and rolling Walk-Forward validation — the statistically rigorous methods required for strategy performance evaluation without look-ahead bias or leakage. Phase 9 — News Sentiment + FinBERT Adds a news ingestion pipeline backed by the HuggingFace transformers library and FinBERT for per-article sentiment scoring. Sentiment features can be fed directly into feature pipelines alongside price-based signals. Phase 10 — AI Research Agents An agent framework (app/agents/) that orchestrates multi-step research tasks: hypothesis generation, feature selection, model evaluation, and backtest interpretation — driven by a language model and operating against the platform’s own APIs. Phase 11 — Strategy Promotion Provides a promotion router (app/api/strategies/promotion_router.py) that forwards validated strategies to the downstream Portfolio Layer, closing the loop from research to live deployment.

Key Capabilities

Plugin Architecture

Register new feature plugins, model backends, signal generators, and backtest engines by dropping a single module — no core code changes required.

Feature Store

Deterministically versioned Parquet features on S3/MinIO with Postgres lineage metadata and a Redis read cache for session-level performance.

ML Model Training

Train XGBoost, LightGBM, CatBoost, Random Forest, and LSTM models with rolling CV, Optuna tuning, and AutoML leaderboards — all async via Celery.

Backtesting Engine

Vectorized (vectorbt) and event-driven (Backtrader) backtest adapters with a unified metrics surface, equity curve download, and parameter sweep.

Strategy Validation

Walk-Forward and Combinatorial Purged Cross-Validation (CPCV) to rigorously evaluate out-of-sample strategy performance without look-ahead bias.

AI Research Agents

LLM-orchestrated agents that automate hypothesis generation, feature selection, model benchmarking, and backtest interpretation end-to-end.

Tech Stack

The platform is assembled from best-in-class open-source libraries chosen for performance, correctness, and ecosystem maturity:

Layer	Technology
Web framework	FastAPI 0.115, Uvicorn
Data layer	SQLAlchemy 2.0 async, asyncpg, Alembic
Validation	Pydantic v2, pydantic-settings
Task queue	Celery 5.4 + Redis broker/backend
Object storage	boto3 / MinIO (S3-compatible)
Experiment tracking	MLflow 2.16
Hyperparameter tuning	Optuna 4.0
ML models	XGBoost, LightGBM, CatBoost, scikit-learn, PyTorch (LSTM)
Backtesting	vectorbt 0.26, Backtrader 1.9
NLP / Sentiment	HuggingFace Transformers, FinBERT, SentencePiece
Feature engineering	pandas-ta, tsfresh
Event bus	kafka-python-ng (Kafka / NATS / noop)
HTTP client	httpx

Once the server is running, visit http://localhost:8000/docs for the fully interactive Swagger UI, which documents every endpoint with request/response schemas and a built-in request builder.

Get Started

Core Concepts

Guides

Hedge Fund Backend: Quant Research Platform

What’s Implemented

Key Capabilities

Plugin Architecture

Feature Store

ML Model Training

Backtesting Engine

Strategy Validation

AI Research Agents

Tech Stack

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Documentation Index

​What’s Implemented

​Key Capabilities

Plugin Architecture

Feature Store

ML Model Training

Backtesting Engine

Strategy Validation

AI Research Agents

​Tech Stack

Build docs developers (and LLMs) love

What’s Implemented

Key Capabilities

Tech Stack