Features are the foundational inputs to every ML model in the Hedge Fund Backend. Rather than re-computing indicators on every training or backtest run, the platform separates a feature definition (the plugin key and parameters) from a feature dataset (the actual computed values for a specific symbol, timeframe, and date range). The Feature Engine manages this split: it generates datasets on demand, content-addresses them with a SHA-256 hash, persists them as Parquet files in S3/MinIO, and records metadata in Postgres — so the same computation is never run twice unless the underlying data changes.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/najmulhossainnj/Hedge-fund-backend/llms.txt
Use this file to discover all available pages before exploring further.
Feature Definitions
AFeature row is a definition — it tells the engine which plugin to run and with what parameters. It does not store the actual time-series values.
Definition Fields
| Field | Type | Description |
|---|---|---|
name | string | Human-readable label for the feature |
type | string | Feature family — see Feature Types below |
plugin_key | string | Registry key that resolves to a BaseFeature subclass |
parameters | object | Plugin-specific hyperparameters (e.g. {"length": 14}) |
storage_uri | string | null | S3/MinIO key prefix where generated datasets are stored |
version | integer | Optimistic-lock version; increments on every update |
Feature Types
Thetype field categorises the data source the plugin consumes:
technical
Computed from OHLCV price/volume data — RSI, ATR, Bollinger Bands, MACD, etc. Implemented via
pandas-ta.statistical
Derived from statistical transforms of price history — autocorrelations, rolling moments, structural breaks. Implemented via
tsfresh.automated
Auto-extracted by tsfresh’s
extract_features across hundreds of time-series statistics at once.news
Sentiment scores and entity counts from news feeds — bullish/bearish polarity per symbol per day.
fundamental
Balance-sheet and income-statement ratios (P/E, P/B, ROE, etc.) from quarterly filings.
macro
Macroeconomic indicators — yield curve slope, VIX level, PMI, CPI surprise.
Feature Generation Pipeline
When you callPOST /api/features/{id}/generate, the Feature Engine runs the following pipeline:
Content-Hash Versioning
The most important property of the Feature Store is content-addressed deduplication. Theversion_hash for a dataset is a SHA-256 digest of all its inputs:
Reproducibility
Re-running the identical pipeline always produces the same hash — and therefore the same dataset — with no ambiguity.
Cache Hits
If the hash already exists in
feature_datasets, the engine skips computation entirely and returns the stored dataset.Automatic Invalidation
If upstream market data is revised (corporate actions, late-arriving prints), the
source_fingerprint changes, producing a new hash and triggering automatic regeneration.The
source_fingerprint is a SHA-256 of the raw OHLCV bytes fed to the plugin. This means a backfill or data vendor correction automatically invalidates cached features without any manual intervention.FeatureDataset
AFeatureDataset row is one generated instance of a feature definition. Multiple datasets can exist for the same feature definition (different symbols, different date ranges, or different source data versions).
feature_datasets table has a composite index on (feature_id, symbol, timeframe, version_hash) to make cache lookups sub-millisecond even at scale.
Built-in Plugins
technical.rsi
RSI — Relative Strength Index. Parameter:
length (default 14). Output column: rsi_{length}.technical.atr
ATR — Average True Range. Parameter:
length (default 14). Output column: atr_{length}.statistical.tsfresh
tsfresh — Extracts a configurable subset of the tsfresh feature library (autocorrelations, entropy, linear trend coefficients, etc.).
news.sentiment
News Sentiment — Aggregates intraday news polarity scores into a daily bullish/bearish score per symbol. Output columns:
sentiment_score, sentiment_count.Regeneration
To force recomputation regardless of cache state — for example, after changing plugin parameters or fixing a data source — call:FeatureGenerateRequest body. The engine computes a new source_fingerprint from the current market data slice, derives a new version_hash, and writes a new FeatureDataset row pointing at a freshly generated Parquet file. The old dataset row is retained for historical reproducibility.
Redis Caching
In addition to the Parquet + Postgres persistence layer, frequently accessed feature datasets are cached in Redis. On a cache hit the engine deserialises the dataset directly from Redis without an S3 round-trip. Cache entries carry a configurable TTL and are evicted when a newFeatureDataset version is written for the same (feature_id, symbol, timeframe) tuple.