Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/najmulhossainnj/Hedge-fund-backend/llms.txt

Use this file to discover all available pages before exploring further.

The ResearchManagerAgent is the orchestration backbone of the Hedge Fund Backend’s quantitative research system. When you submit a natural language query, the manager parses it for symbol tickers, timeframe hints, and date ranges, then sequentially dispatches six specialist agents — each building on the outputs of those before it. Every agent writes its results back into a shared AgentContext that flows through the entire pipeline, accumulating feature IDs, model artefacts, backtest runs, and governance flags as research progresses.

Agent Pipeline

The six specialist agents are always executed in the following order. Each agent is stateless; all mutable research state lives in AgentContext and is threaded through by the manager.
#AgentRole valueResponsibility
1FeatureDiscoveryAgentfeature_discoveryParses query for hints (rsi, sentiment, momentum, etc.), selects candidate feature plugins from the registry, runs a lightweight Random Forest + SHAP pass to prune low-signal features, and persists winning Feature rows to the database.
2ModelDiscoveryAgentmodel_discoveryRuns an AutoML leaderboard across ml.xgboost, ml.lightgbm, and ml.random_forest (or a model hinted in the query). Scores each candidate with 3-fold CV and selects the best plugin_key, then persists an MLModel row.
3HyperparameterAgenthyperparameterRuns an Optuna study (default 20 trials) against the winning model using the plugin’s registered search space. Updates the MLModel row with tuned parameters and writes best_model_params back into context.
4BacktestAgentbacktestTrains the tuned model on the full assembled dataset, then creates and executes a Backtest record using the vectorbt engine with default capital of $100,000, 0.05% commission, and 0.05% slippage.
5ValidationAgentvalidationRuns 5-fold rolling walk-forward analysis. If walk-forward passes, proceeds to Combinatorial Purged Cross-Validation (CPCV, 6 splits / 2 test splits) to compute Probability of Backtest Overfitting (PBO) and deflated Sharpe.
6GovernanceAgentgovernanceChecks for overfitting (IS/OOS Sharpe ratio, PBO), data leakage (future-peeking feature keys, look-ahead bias heuristics), parameter instability (learning rate bounds, tree depth, fold Sharpe variance), and minimum sample size. The pipeline halts immediately if any CRITICAL flag is raised.

Query Parsing

The manager applies a lightweight regex parser to every incoming query string before agents run. It extracts:
  • Symbols — uppercase 1–5 letter words that look like tickers (e.g. AAPL, TSLA). Common stop-words and indicator names (RSI, MACD, ATR) are filtered out. Up to five symbols are captured.
  • Timeframe — looks for weekly/1w"1w", hourly/1h"1h", otherwise defaults to "1d".
  • Date range — if start_date/end_date are omitted from the request body, the manager defaults to the last three years ending at the current UTC time.
Parsed values are merged into AgentContext only when not already set by the request body, so explicit body fields always take priority.

Research Pipeline

Run Full Pipeline

POST
endpoint
/api/v1/agents/research
Runs the complete six-agent research pipeline synchronously and returns a final report when all agents have completed (or when the governance agent halts the pipeline). The response includes every agent’s result, the final context snapshot, and any errors surfaced by individual agents.
For long-running pipelines — particularly when Optuna tuning with many trials or CPCV validation is involved — prefer the streaming endpoint /api/v1/agents/research/stream to avoid HTTP gateway timeouts.

Request Body

query
string
required
Natural language research instruction. The manager parses symbols, timeframe, and date range from this string. Example: "Build a momentum strategy using RSI and news sentiment on AAPL"
symbols
list[string]
Explicit list of ticker symbols. When provided, these override any tickers parsed from the query. Defaults to an empty list (parser fills it from the query).
timeframe
string
default:"1d"
Bar timeframe string. Supported values: "1d" (daily), "1h" (hourly), "1w" (weekly).
start_date
datetime
ISO 8601 datetime marking the beginning of the data window. Defaults to three years before end_date if omitted.
end_date
datetime
ISO 8601 datetime marking the end of the data window. Defaults to current UTC time if omitted.
strategy_id
string
UUID string of an existing Strategy row. If omitted, the manager creates a new Strategy record named after the first 60 characters of the query.

Response

session_id
string
UUID identifying this research session. Pass this to /chat or /sessions/{session_id} for follow-up queries.
success
boolean
true if the pipeline completed without any CRITICAL governance flags.
summary
string
One-line human-readable summary: session prefix, universe, selected model, walk-forward pass status, and governance flag count.
details
object
errors
list[string]
Aggregated error strings from any agent that failed. Empty on full success.
curl -X POST https://api.example.com/api/v1/agents/research \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Build a momentum strategy using RSI and news sentiment on AAPL",
    "timeframe": "1d",
    "start_date": "2021-01-01T00:00:00Z",
    "end_date": "2024-01-01T00:00:00Z"
  }'
{
  "session_id": "3f9a1c2e-4b7d-4e8a-9c0d-1a2b3c4d5e6f",
  "success": true,
  "summary": "Research session 3f9a1c2e: Universe: AAPL | Model: ml.xgboost | WF passed: True | Governance: 0 flag(s)",
  "details": {
    "agent_results": {
      "feature_discovery": {
        "role": "feature_discovery",
        "success": true,
        "summary": "Discovered 3 candidate features (technical.rsi, news.finbert_sentiment, news.sentiment_momentum). SHAP pruned to 3 high-signal features.",
        "details": { "candidates": [...], "shap_importance": {...} },
        "errors": [],
        "ctx_updates": { "feature_ids": ["..."], "candidate_features": [...] }
      },
      "governance": {
        "role": "governance",
        "success": true,
        "summary": "Governance check: no issues detected. ✓",
        "details": { "overfitting": {}, "leakage": {}, "parameter_stability": {}, "sample_size": {} },
        "errors": [],
        "ctx_updates": { "governance_flags": [] }
      }
    },
    "context": {
      "session_id": "3f9a1c2e-4b7d-4e8a-9c0d-1a2b3c4d5e6f",
      "symbols": ["AAPL"],
      "timeframe": "1d",
      "strategy_id": "a1b2c3d4-...",
      "feature_ids": ["f1...", "f2...", "f3..."],
      "model_id": "m1...",
      "best_model_plugin": "ml.xgboost",
      "best_model_params": { "max_depth": 6, "learning_rate": 0.05, "n_estimators": 200 },
      "backtest_ids": ["b1..."],
      "governance_flags": [],
      "validation_passed": true
    }
  },
  "errors": []
}

Stream Research Pipeline (SSE)

POST
endpoint
/api/v1/agents/research/stream
Identical request body to /research but responds with a text/event-stream (Server-Sent Events) connection. The server emits one JSON event per agent lifecycle transition so you can display real-time progress in a UI or log pipeline state without polling.
Use this endpoint for pipelines that include Optuna hyperparameter tuning or CPCV validation — these can run for several minutes. The synchronous /research endpoint holds the HTTP connection open for the entire duration and may be terminated by an upstream gateway timeout (typically 60–90 seconds).

Request Body

Same fields as Run Full Pipeline.

Response — text/event-stream

Each line is data: <json>\n\n. The JSON payload varies by event type:
EventWhen emittedKey fields
startPipeline beginssession_id, strategy_id, message
agent_startBefore each agent runsrole, message
agent_doneAfter each agent completesrole, success, summary, details, errors
pipeline_haltedGovernanceAgent raises a CRITICAL flagreason, flags
completeAll agents finished successfullysession_id, strategy_id, summary, context
const source = new EventSource('/api/v1/agents/research/stream', {
  // EventSource doesn't support POST bodies natively — use a polyfill
  // such as `eventsource` (Node) or `fetch-event-source` (browser):
});

// Using @microsoft/fetch-event-source:
import { fetchEventSource } from '@microsoft/fetch-event-source';

fetchEventSource('/api/v1/agents/research/stream', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    query: 'Build a momentum strategy using RSI and news sentiment on AAPL',
    timeframe: '1d',
    start_date: '2021-01-01T00:00:00Z',
    end_date: '2024-01-01T00:00:00Z',
  }),
  onmessage(event) {
    const update = JSON.parse(event.data);
    console.log(`[${update.event}]`, update.role ?? '', update.summary ?? update.message ?? '');

    if (update.event === 'pipeline_halted') {
      console.error('Pipeline halted — governance flags:', update.flags);
    }
    if (update.event === 'complete') {
      console.log('Session:', update.session_id, '| Final context:', update.context);
    }
  },
});

Chat

Single-Turn AI Researcher Chat

POST
endpoint
/api/v1/agents/chat
Routes a single natural language question to the ResearchManagerAgent without necessarily running the full six-agent pipeline. When a session_id is provided, the agent rehydrates the prior AgentContext — restoring symbols, feature IDs, model ID, governance flags, and strategy ID — so follow-up questions work correctly within an ongoing research session.

Request Body

query
string
required
Natural language question or instruction for the AI Researcher.
session_id
string
UUID of a previous research session. When provided, the existing context is restored so follow-up questions can reference prior pipeline outputs such as the selected model or feature set.
context_override
object
Key-value pairs that override specific AgentContext fields before the query is processed. Useful for adjusting symbols or timeframe without running a new full pipeline.

Response

session_id
string
UUID for this chat session (new UUID if no session_id was passed in the request).
reply
string
The manager’s one-line summary response to the query.
details
object
Full agent result details including the updated context snapshot.
success
boolean
true if the manager completed without critical errors.
# Follow-up question referencing a previous research session
curl -X POST https://api.example.com/api/v1/agents/chat \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What governance flags were raised in the last run?",
    "session_id": "3f9a1c2e-4b7d-4e8a-9c0d-1a2b3c4d5e6f"
  }'
{
  "session_id": "3f9a1c2e-4b7d-4e8a-9c0d-1a2b3c4d5e6f",
  "reply": "Research session 3f9a1c2e: Universe: AAPL | Model: ml.xgboost | WF passed: True | Governance: 0 flag(s)",
  "details": {
    "agent_results": { "...": "..." },
    "context": { "governance_flags": [], "symbols": ["AAPL"], "..." : "..." }
  },
  "success": true
}

Sessions

Retrieve Session Context

GET
endpoint
/api/v1/agents/sessions/{session_id}
Returns a point-in-time snapshot of the AgentContext that was saved at the end of a research or chat run. Use this to inspect pipeline artefact IDs, the selected model, or governance flags from a prior session without re-running anything.

Path Parameters

session_id
string
required
UUID of the session to retrieve.

Response

session_id
string
UUID of the session.
symbols
list[string]
Instrument tickers used in this session.
timeframe
string
Bar timeframe (e.g. "1d").
strategy_id
string
UUID of the Strategy row created or used by this session.
feature_ids
list[string]
UUIDs of Feature rows discovered and persisted during this session.
model_id
string
UUID of the winning MLModel row.
best_model_plugin
string
Plugin key of the winning model (e.g. "ml.xgboost").
best_model_params
object
Tuned hyperparameter dictionary for the winning model.
backtest_ids
list[string]
UUIDs of Backtest rows executed during this session.
governance_flags
list[string]
All governance flag strings (severity-prefixed) raised by the GovernanceAgent.
validation_passed
boolean
true if both walk-forward and CPCV validation passed.
A 404 is returned if the session ID does not exist in the in-process store. This can occur after a server restart — see the note on sessions below.
curl https://api.example.com/api/v1/agents/sessions/3f9a1c2e-4b7d-4e8a-9c0d-1a2b3c4d5e6f

List All Active Sessions

GET
endpoint
/api/v1/agents/sessions
Returns all session IDs currently held in the in-process session store.
Sessions are stored in an in-process Python dictionary (_sessions). They are not persisted to the database and will be lost on server restart. For production deployments, the session store should be replaced with a Redis backend to survive worker restarts and support horizontal scaling.

Response

sessions
list[string]
List of all active session UUID strings.
total
integer
Total number of active sessions.
curl https://api.example.com/api/v1/agents/sessions
{
  "sessions": [
    "3f9a1c2e-4b7d-4e8a-9c0d-1a2b3c4d5e6f",
    "7a8b9c0d-1e2f-3a4b-5c6d-7e8f9a0b1c2d"
  ],
  "total": 2
}

Build docs developers (and LLMs) love