The Hedge Fund Backend is designed as a single-responsibility Research Layer service within a larger microservice architecture. It owns the full lifecycle of quantitative strategy research — from feature computation and model training through backtesting and validation — and delegates market data ingestion to an upstream service and live portfolio execution to a downstream one. Internally the platform is structured as a series of concentric layers: a thin FastAPI routing layer, a set of stateless engine pipelines that implement the business logic, a plugin registry that decouples compute from infrastructure, and a durable storage tier spread across PostgreSQL, S3/MinIO, Redis, and MLflow. Understanding this separation makes it straightforward to extend any single layer without touching the others.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/najmulhossainnj/Hedge-fund-backend/llms.txt
Use this file to discover all available pages before exploring further.
Project Layout
The entire application lives underapp/, which is partitioned by responsibility rather than by feature:
The interactive Swagger UI is available at
/docs once the server is running. It documents every registered endpoint with full request/response schemas and a built-in request builder.Request Lifecycle
Every synchronous API call follows a consistent path through the application:Async Task Flow
Compute-heavy operations — model training, hyperparameter tuning, feature generation, backtest execution, and parameter sweeps — are dispatched to the Celery worker pool so that API calls return immediately with atask_id rather than blocking for minutes:
GET /api/v1/tasks/{task_id} endpoint is generic — it works for training, tuning, feature generation, backtest, and sweep tasks without any per-task router code.
Storage Layers
The platform uses four complementary storage technologies, each selected for what it does best: PostgreSQL — relational domain state All domain model rows live in Postgres: strategies, feature definitions,FeatureDataset version records, ML model metadata, experiment runs, and backtest configuration and metrics. SQLAlchemy 2.0’s async engine (asyncpg driver) means no blocking I/O on the main event loop. Alembic manages all schema migrations.
| Table | Purpose |
|---|---|
strategies | Strategy definitions + universe + timeframe |
features | Feature plugin configurations |
feature_datasets | Per-symbol/timeframe/date-range versioned instances |
ml_models | Model plugin configs, status, S3 artifact path |
experiments | Experiment containers linking runs to strategies |
backtests | Backtest configs, engine choice, JSONB metrics |
feature-store— computed feature arrays as Parquet files, keyed by the SHA-256 content hash for deduplication.research-artifacts— trained model files (XGBoost.ubj, LightGBM.txt, PyTorch.pt), equity curves, trade lists, drawdown series, and MLflow artifact directories.
app/core/storage.py boto3 wrapper is the single access point for all S3 operations across engines.
Redis — cache and message broker
Redis serves two roles simultaneously:
- Feature read cache (
app/core/cache.py): recently-accessed feature Parquet payloads are cached in Redis so that repeated calls within a research session skip the S3 round-trip. - Celery broker + result backend: three separate Redis databases are used — db 0 for the application cache, db 1 for the Celery task broker queue, and db 2 for the Celery result backend.
s3://research-artifacts/mlflow via the S3 artifact store. The GET /api/v1/experiments/compare endpoint diffs metrics across up to 10 runs and highlights the best performer per metric.
Event Bus
On application startup, a Kafka consumer is registered to listen for events published by the upstream Market Data Layer:dispatch handler in app/events/handlers.py routes each inbound event to the appropriate engine call — for example, a market.datasetcreated event can trigger an automatic feature regeneration for any strategy that depends on the affected dataset.
The event backend is configurable via the EVENT_BACKEND environment variable:
| Value | Behaviour |
|---|---|
kafka | Real Kafka consumer/producer via kafka-python-ng |
nats | NATS JetStream consumer/producer |
noop | No-op stub — all publish/consume calls are silent (safe for local dev without Kafka) |
Set
EVENT_BACKEND=noop in your local .env if you do not want to run a Kafka broker during development. The API will function normally; only the reactive event-driven feature regeneration will be disabled.External Dependencies
The Research Layer integrates with two external services at runtime: Market Data Layer (MARKET_DATA_URL)
The FeaturePipeline fetches OHLCV price data and raw news articles from the Market Data Layer via app/engines/feature_engine/market_data_client.py (an httpx-based async HTTP client). The base URL is configured via the MARKET_DATA_URL environment variable (default: http://localhost:8001). All feature computation is gated on this client — if the Market Data Layer is unavailable, feature generation tasks will fail gracefully and mark the FeatureDataset row as FAILED.
Portfolio Layer (strategy promotion)
When a strategy passes validation, it can be promoted to the Portfolio Layer via POST /api/v1/strategies/{id}/promote. The promotion router (app/api/strategies/promotion_router.py) serialises the strategy configuration and forwards it to the Portfolio Layer’s ingestion endpoint, closing the loop from research to live execution.
Plugin Architecture
The plugin system is the extensibility backbone of the platform. Every compute-heavy operation — feature calculation, model inference, signal generation, backtest simulation — is implemented as a plugin that satisfies a typed abstract base class. ThePluginRegistry maps string keys to plugin classes, and engines resolve the correct plugin at runtime using these keys.
app/plugins/base.py, place the module in the relevant sub-package under app/plugins/, and call registry.register("<your_key>", YourPluginClass). No changes to routers, engines, or any other core module are required.