AI Research Agents API — Automated Strategy Research

The ResearchManagerAgent is the orchestration backbone of the Hedge Fund Backend’s quantitative research system. When you submit a natural language query, the manager parses it for symbol tickers, timeframe hints, and date ranges, then sequentially dispatches six specialist agents — each building on the outputs of those before it. Every agent writes its results back into a shared AgentContext that flows through the entire pipeline, accumulating feature IDs, model artefacts, backtest runs, and governance flags as research progresses.

Agent Pipeline

The six specialist agents are always executed in the following order. Each agent is stateless; all mutable research state lives in AgentContext and is threaded through by the manager.

#	Agent	Role value	Responsibility
1	FeatureDiscoveryAgent	`feature_discovery`	Parses query for hints (`rsi`, `sentiment`, `momentum`, etc.), selects candidate feature plugins from the registry, runs a lightweight Random Forest + SHAP pass to prune low-signal features, and persists winning `Feature` rows to the database.
2	ModelDiscoveryAgent	`model_discovery`	Runs an AutoML leaderboard across `ml.xgboost`, `ml.lightgbm`, and `ml.random_forest` (or a model hinted in the query). Scores each candidate with 3-fold CV and selects the best `plugin_key`, then persists an `MLModel` row.
3	HyperparameterAgent	`hyperparameter`	Runs an Optuna study (default 20 trials) against the winning model using the plugin’s registered search space. Updates the `MLModel` row with tuned parameters and writes `best_model_params` back into context.
4	BacktestAgent	`backtest`	Trains the tuned model on the full assembled dataset, then creates and executes a `Backtest` record using the vectorbt engine with default capital of $100,000, 0.05% commission, and 0.05% slippage.
5	ValidationAgent	`validation`	Runs 5-fold rolling walk-forward analysis. If walk-forward passes, proceeds to Combinatorial Purged Cross-Validation (CPCV, 6 splits / 2 test splits) to compute Probability of Backtest Overfitting (PBO) and deflated Sharpe.
6	GovernanceAgent	`governance`	Checks for overfitting (IS/OOS Sharpe ratio, PBO), data leakage (future-peeking feature keys, look-ahead bias heuristics), parameter instability (learning rate bounds, tree depth, fold Sharpe variance), and minimum sample size. The pipeline halts immediately if any `CRITICAL` flag is raised.

Query Parsing

The manager applies a lightweight regex parser to every incoming query string before agents run. It extracts:

Symbols — uppercase 1–5 letter words that look like tickers (e.g. AAPL, TSLA). Common stop-words and indicator names (RSI, MACD, ATR) are filtered out. Up to five symbols are captured.
Timeframe — looks for weekly/1w → "1w", hourly/1h → "1h", otherwise defaults to "1d".
Date range — if start_date/end_date are omitted from the request body, the manager defaults to the last three years ending at the current UTC time.

Parsed values are merged into AgentContext only when not already set by the request body, so explicit body fields always take priority.

Research Pipeline

Run Full Pipeline

POST

endpoint

/api/v1/agents/research

Runs the complete six-agent research pipeline synchronously and returns a final report when all agents have completed (or when the governance agent halts the pipeline). The response includes every agent’s result, the final context snapshot, and any errors surfaced by individual agents.

For long-running pipelines — particularly when Optuna tuning with many trials or CPCV validation is involved — prefer the streaming endpoint /api/v1/agents/research/stream to avoid HTTP gateway timeouts.

Request Body

query

string

required

Natural language research instruction. The manager parses symbols, timeframe, and date range from this string. Example: "Build a momentum strategy using RSI and news sentiment on AAPL"

symbols

list[string]

Explicit list of ticker symbols. When provided, these override any tickers parsed from the query. Defaults to an empty list (parser fills it from the query).

timeframe

string

default:"1d"

Bar timeframe string. Supported values: "1d" (daily), "1h" (hourly), "1w" (weekly).

start_date

datetime

ISO 8601 datetime marking the beginning of the data window. Defaults to three years before end_date if omitted.

end_date

datetime

ISO 8601 datetime marking the end of the data window. Defaults to current UTC time if omitted.

strategy_id

string

UUID string of an existing Strategy row. If omitted, the manager creates a new Strategy record named after the first 60 characters of the query.

Response

session_id

string

UUID identifying this research session. Pass this to /chat or /sessions/{session_id} for follow-up queries.

success

boolean

true if the pipeline completed without any CRITICAL governance flags.

summary

string

One-line human-readable summary: session prefix, universe, selected model, walk-forward pass status, and governance flag count.

details

object

Show details fields

agent_results

object

Keyed by agent role value (e.g. "feature_discovery", "model_discovery"). Each value is a serialised AgentResult containing role, success, summary, details, errors, and ctx_updates.

context

object

Final AgentContext snapshot containing session_id, symbols, timeframe, strategy_id, feature_ids, model_id, best_model_plugin, best_model_params, backtest_ids, governance_flags, and validation_passed.

errors

list[string]

Aggregated error strings from any agent that failed. Empty on full success.

curl -X POST https://api.example.com/api/v1/agents/research \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Build a momentum strategy using RSI and news sentiment on AAPL",
    "timeframe": "1d",
    "start_date": "2021-01-01T00:00:00Z",
    "end_date": "2024-01-01T00:00:00Z"
  }'

{
  "session_id": "3f9a1c2e-4b7d-4e8a-9c0d-1a2b3c4d5e6f",
  "success": true,
  "summary": "Research session 3f9a1c2e: Universe: AAPL | Model: ml.xgboost | WF passed: True | Governance: 0 flag(s)",
  "details": {
    "agent_results": {
      "feature_discovery": {
        "role": "feature_discovery",
        "success": true,
        "summary": "Discovered 3 candidate features (technical.rsi, news.finbert_sentiment, news.sentiment_momentum). SHAP pruned to 3 high-signal features.",
        "details": { "candidates": [...], "shap_importance": {...} },
        "errors": [],
        "ctx_updates": { "feature_ids": ["..."], "candidate_features": [...] }
      },
      "governance": {
        "role": "governance",
        "success": true,
        "summary": "Governance check: no issues detected. ✓",
        "details": { "overfitting": {}, "leakage": {}, "parameter_stability": {}, "sample_size": {} },
        "errors": [],
        "ctx_updates": { "governance_flags": [] }
      }
    },
    "context": {
      "session_id": "3f9a1c2e-4b7d-4e8a-9c0d-1a2b3c4d5e6f",
      "symbols": ["AAPL"],
      "timeframe": "1d",
      "strategy_id": "a1b2c3d4-...",
      "feature_ids": ["f1...", "f2...", "f3..."],
      "model_id": "m1...",
      "best_model_plugin": "ml.xgboost",
      "best_model_params": { "max_depth": 6, "learning_rate": 0.05, "n_estimators": 200 },
      "backtest_ids": ["b1..."],
      "governance_flags": [],
      "validation_passed": true
    }
  },
  "errors": []
}

Stream Research Pipeline (SSE)

POST

endpoint

/api/v1/agents/research/stream

Identical request body to /research but responds with a text/event-stream (Server-Sent Events) connection. The server emits one JSON event per agent lifecycle transition so you can display real-time progress in a UI or log pipeline state without polling.

Use this endpoint for pipelines that include Optuna hyperparameter tuning or CPCV validation — these can run for several minutes. The synchronous /research endpoint holds the HTTP connection open for the entire duration and may be terminated by an upstream gateway timeout (typically 60–90 seconds).

Request Body

Same fields as Run Full Pipeline.

Response — `text/event-stream`

Each line is data: <json>\n\n. The JSON payload varies by event type:

Event	When emitted	Key fields
`start`	Pipeline begins	`session_id`, `strategy_id`, `message`
`agent_start`	Before each agent runs	`role`, `message`
`agent_done`	After each agent completes	`role`, `success`, `summary`, `details`, `errors`
`pipeline_halted`	GovernanceAgent raises a CRITICAL flag	`reason`, `flags`
`complete`	All agents finished successfully	`session_id`, `strategy_id`, `summary`, `context`

const source = new EventSource('/api/v1/agents/research/stream', {
  // EventSource doesn't support POST bodies natively — use a polyfill
  // such as `eventsource` (Node) or `fetch-event-source` (browser):
});

// Using @microsoft/fetch-event-source:
import { fetchEventSource } from '@microsoft/fetch-event-source';

fetchEventSource('/api/v1/agents/research/stream', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    query: 'Build a momentum strategy using RSI and news sentiment on AAPL',
    timeframe: '1d',
    start_date: '2021-01-01T00:00:00Z',
    end_date: '2024-01-01T00:00:00Z',
  }),
  onmessage(event) {
    const update = JSON.parse(event.data);
    console.log(`[${update.event}]`, update.role ?? '', update.summary ?? update.message ?? '');

    if (update.event === 'pipeline_halted') {
      console.error('Pipeline halted — governance flags:', update.flags);
    }
    if (update.event === 'complete') {
      console.log('Session:', update.session_id, '| Final context:', update.context);
    }
  },
});

Chat

Single-Turn AI Researcher Chat

POST

endpoint

/api/v1/agents/chat

Routes a single natural language question to the ResearchManagerAgent without necessarily running the full six-agent pipeline. When a session_id is provided, the agent rehydrates the prior AgentContext — restoring symbols, feature IDs, model ID, governance flags, and strategy ID — so follow-up questions work correctly within an ongoing research session.

Request Body

query

string

required

Natural language question or instruction for the AI Researcher.

session_id

string

UUID of a previous research session. When provided, the existing context is restored so follow-up questions can reference prior pipeline outputs such as the selected model or feature set.

context_override

object

Key-value pairs that override specific AgentContext fields before the query is processed. Useful for adjusting symbols or timeframe without running a new full pipeline.

Response

session_id

string

UUID for this chat session (new UUID if no session_id was passed in the request).

string

The manager’s one-line summary response to the query.

details

object

Full agent result details including the updated context snapshot.

success

boolean

true if the manager completed without critical errors.

# Follow-up question referencing a previous research session
curl -X POST https://api.example.com/api/v1/agents/chat \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What governance flags were raised in the last run?",
    "session_id": "3f9a1c2e-4b7d-4e8a-9c0d-1a2b3c4d5e6f"
  }'

{
  "session_id": "3f9a1c2e-4b7d-4e8a-9c0d-1a2b3c4d5e6f",
  "reply": "Research session 3f9a1c2e: Universe: AAPL | Model: ml.xgboost | WF passed: True | Governance: 0 flag(s)",
  "details": {
    "agent_results": { "...": "..." },
    "context": { "governance_flags": [], "symbols": ["AAPL"], "..." : "..." }
  },
  "success": true
}

Sessions

Retrieve Session Context

GET

endpoint

/api/v1/agents/sessions/{session_id}

Returns a point-in-time snapshot of the AgentContext that was saved at the end of a research or chat run. Use this to inspect pipeline artefact IDs, the selected model, or governance flags from a prior session without re-running anything.

Path Parameters

session_id

string

required

UUID of the session to retrieve.

Response

session_id

string

UUID of the session.

symbols

list[string]

Instrument tickers used in this session.

timeframe

string

Bar timeframe (e.g. "1d").

strategy_id

string

UUID of the Strategy row created or used by this session.

feature_ids

list[string]

UUIDs of Feature rows discovered and persisted during this session.

model_id

string

UUID of the winning MLModel row.

best_model_plugin

string

Plugin key of the winning model (e.g. "ml.xgboost").

best_model_params

object

Tuned hyperparameter dictionary for the winning model.

backtest_ids

list[string]

UUIDs of Backtest rows executed during this session.

governance_flags

list[string]

All governance flag strings (severity-prefixed) raised by the GovernanceAgent.

validation_passed

boolean

true if both walk-forward and CPCV validation passed.

A 404 is returned if the session ID does not exist in the in-process store. This can occur after a server restart — see the note on sessions below.

curl https://api.example.com/api/v1/agents/sessions/3f9a1c2e-4b7d-4e8a-9c0d-1a2b3c4d5e6f

List All Active Sessions

GET

endpoint

/api/v1/agents/sessions

Returns all session IDs currently held in the in-process session store.

Sessions are stored in an in-process Python dictionary (_sessions). They are not persisted to the database and will be lost on server restart. For production deployments, the session store should be replaced with a Redis backend to survive worker restarts and support horizontal scaling.

Response

sessions

list[string]

List of all active session UUID strings.

total

integer

Total number of active sessions.

curl https://api.example.com/api/v1/agents/sessions

{
  "sessions": [
    "3f9a1c2e-4b7d-4e8a-9c0d-1a2b3c4d5e6f",
    "7a8b9c0d-1e2f-3a4b-5c6d-7e8f9a0b1c2d"
  ],
  "total": 2
}

Strategies

Features & Models

Backtesting & Validation

Intelligence & Tracking

AI Research Agents API — Automated Strategy Research

Agent Pipeline

Query Parsing

Research Pipeline

Run Full Pipeline

Request Body

Response

Stream Research Pipeline (SSE)

Request Body

Response — `text/event-stream`

Chat

Single-Turn AI Researcher Chat

Request Body

Response

Sessions

Retrieve Session Context

Path Parameters

Response

List All Active Sessions

Response

Build docs developers (and LLMs) love

Strategies

Features & Models

Backtesting & Validation

Intelligence & Tracking

Documentation Index

​Agent Pipeline

​Query Parsing

​Research Pipeline

​Run Full Pipeline

​Request Body

​Response

​Stream Research Pipeline (SSE)

​Request Body

​Response — text/event-stream

​Chat

​Single-Turn AI Researcher Chat

​Request Body

​Response

​Sessions

​Retrieve Session Context

​Path Parameters

​Response

​List All Active Sessions

​Response

Build docs developers (and LLMs) love

Agent Pipeline

Query Parsing

Research Pipeline

Run Full Pipeline

Request Body

Response

Stream Research Pipeline (SSE)

Request Body

Response — `text/event-stream`

Chat

Single-Turn AI Researcher Chat

Request Body

Response

Sessions

Retrieve Session Context

Path Parameters

Response

List All Active Sessions

Response