Recall API — search and retrieve agent memories

The recall endpoint retrieves structured memories from a bank using a multi-strategy pipeline. When you call recall, Hindsight runs four retrieval strategies in parallel — semantic similarity, keyword (BM25), graph traversal, and temporal — fuses their rankings using Reciprocal Rank Fusion (RRF), then re-scores the merged candidates with a cross-encoder reranker. The response contains structured facts in relevance order, not raw documents.

To learn about the four retrieval strategies and RRF fusion in depth, see the Recall Architecture guide.

Endpoint

POST /v1/{tenant}/banks/{bank_id}/recall

Request parameters

path.tenant

string

required

Your tenant identifier. Use default for single-tenant deployments.

path.bank_id

string

required

The memory bank to search.

query

string

required

The natural language question or statement to search for. Drives all four retrieval strategies simultaneously: embedded for semantic search, tokenized for BM25 keyword search, used to seed graph traversal, and parsed for temporal expressions. Also passed to the cross-encoder reranker. Queries exceeding 500 tokens are rejected.

types

string[]

Filters which categories of memory facts are searched. Accepted values: world (objective facts), experience (events and conversations), observation (deduplicated, evidence-grounded beliefs consolidated from multiple memories). When omitted, all three types are searched.

budget

string

default:"mid"

Controls retrieval depth and breadth. Accepted values: low (fast simple lookups), mid (balanced everyday queries), high (exhaustive coverage for complex questions requiring indirect connections).

max_tokens

integer

default:"4096"

Maximum number of tokens the returned facts can collectively occupy. Only the text field of each fact is counted. After reranking, facts are included in relevance order until this budget is exhausted. Set this to however much of your context window you want to allocate to memories.

query_timestamp

string

An ISO 8601 datetime representing when the query is being asked. Used as the anchor for resolving relative temporal expressions in the query (e.g. “last month”). Without it, the server’s current time is used. Most useful for replaying historical conversations or building time-anchored recall.

tags

string[]

Filters recall to only memories matching the specified tags. Applied at the database level across all four retrieval strategies, not as a post-processing step.

tags_match

string

default:"any"

Controls how tag filtering is applied. Accepted values:

Mode	Untagged memories	Match condition
`any`	Included	Memory has at least one of the specified tags
`any_strict`	Excluded	Memory has at least one of the specified tags
`all`	Included	Memory has all of the specified tags
`all_strict`	Excluded	Memory has all of the specified tags

tag_groups

object[]

Compound boolean tag filters. Groups in the list are AND-ed together at the top level. Each group is a recursive boolean expression: a leaf node {"tags": [...], "match": "..."}, or a compound node {"and": [...]}, {"or": [...]}, or {"not": ...}. Can be combined with tags and tags_match — they are AND-ed together.

include

object

Controls optional supplementary data returned alongside the main facts.

Show include fields

chunks

object

When enabled, the response includes the raw source text chunks from which each fact was extracted. The max_tokens sub-option (default 8192) controls the total chunk token budget independently of the main fact budget. Useful when agents need surrounding context beyond the extracted fact text.

source_facts

object

When enabled and types includes observation, each observation result is accompanied by the original contributing facts it was synthesized from. The max_tokens sub-option (default 4096) limits the total token budget for source facts.

entities

object

Enabled by default. Each returned fact includes the canonical names of entities associated with it. Set to null to skip the entity JOIN query and reduce response size.

trace

boolean

default:"false"

When true, the response includes a detailed debug trace covering the query embedding, per-strategy retrieval results, RRF fusion candidates, reranked results, temporal constraints detected, and per-phase timings. Has no effect on retrieval logic.

Response fields

results

object[]

required

The main list of recalled facts, ordered by relevance. Results do not include a numeric score — what matters is the relative ordering, already reflected in list order.

Show result fields

string

required

Unique identifier of this fact.

text

string

required

The extracted fact text as stored in the memory bank.

type

string

required

The fact category: world for objective information, experience for events and conversations, or observation for consolidated knowledge synthesized over time.

context

string

The context label provided when the fact was retained (e.g. "team meeting", "slack"). null if none was set.

metadata

object

The key-value string pairs attached when the fact was retained. null if none were set.

Examples

Basic recall

curl -X POST "https://your-hindsight-host/v1/default/banks/my-bank/recall" \
  -H "Authorization: Bearer $HINDSIGHT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are Alice'\''s communication preferences?",
    "max_tokens": 2048
  }'

Recall with budget levels

response = client.recall(
    bank_id="my-bank",
    query="What tools does the team use?",
    budget="low",
    max_tokens=1024,
)

Tag-scoped recall

curl -X POST "https://your-hindsight-host/v1/default/banks/my-bank/recall" \
  -H "Authorization: Bearer $HINDSIGHT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are this user'\''s preferences?",
    "tags": ["user:alice"],
    "tags_match": "all_strict",
    "max_tokens": 2048
  }'

Recall observations with source facts

response = client.recall(
    bank_id="my-bank",
    query="What patterns have we observed in Alice's behavior?",
    types=["observation"],
    include_source_facts=True,
    max_tokens=4096,
)
for obs in response.results:
    print(obs.text)
    for fact_id in (obs.source_fact_ids or []):
        source = response.source_facts.get(fact_id)
        if source:
            print(f"  <- {source.text}")

Error codes

Status	Code	Description
`400`	`invalid_request`	Malformed request body or query exceeds 500 tokens.
`401`	`unauthorized`	Missing or invalid API key.
`404`	`bank_not_found`	The specified bank does not exist.
`422`	`validation_error`	One or more parameters failed validation.
`429`	`rate_limited`	Too many requests. Retry with exponential backoff.
`500`	`internal_error`	Server error during retrieval.

Core Methods

Resources

Recall API — search and retrieve agent memories

Endpoint

Request parameters

Response fields

Examples

Basic recall

Recall with budget levels

Tag-scoped recall

Recall observations with source facts

Error codes

Build docs developers (and LLMs) love

Core Methods

Resources

Documentation Index

​Endpoint

​Request parameters

​Response fields

​Examples

​Basic recall

​Recall with budget levels

​Tag-scoped recall

​Recall observations with source facts

​Error codes

Build docs developers (and LLMs) love

Endpoint

Request parameters

Response fields

Examples

Basic recall

Recall with budget levels

Tag-scoped recall

Recall observations with source facts

Error codes