Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vectorize-io/hindsight/llms.txt

Use this file to discover all available pages before exploring further.

The recall endpoint retrieves structured memories from a bank using a multi-strategy pipeline. When you call recall, Hindsight runs four retrieval strategies in parallel — semantic similarity, keyword (BM25), graph traversal, and temporal — fuses their rankings using Reciprocal Rank Fusion (RRF), then re-scores the merged candidates with a cross-encoder reranker. The response contains structured facts in relevance order, not raw documents.
To learn about the four retrieval strategies and RRF fusion in depth, see the Recall Architecture guide.

Endpoint

POST /v1/{tenant}/banks/{bank_id}/recall

Request parameters

path.tenant
string
required
Your tenant identifier. Use default for single-tenant deployments.
path.bank_id
string
required
The memory bank to search.
query
string
required
The natural language question or statement to search for. Drives all four retrieval strategies simultaneously: embedded for semantic search, tokenized for BM25 keyword search, used to seed graph traversal, and parsed for temporal expressions. Also passed to the cross-encoder reranker. Queries exceeding 500 tokens are rejected.
types
string[]
Filters which categories of memory facts are searched. Accepted values: world (objective facts), experience (events and conversations), observation (deduplicated, evidence-grounded beliefs consolidated from multiple memories). When omitted, all three types are searched.
budget
string
default:"mid"
Controls retrieval depth and breadth. Accepted values: low (fast simple lookups), mid (balanced everyday queries), high (exhaustive coverage for complex questions requiring indirect connections).
max_tokens
integer
default:"4096"
Maximum number of tokens the returned facts can collectively occupy. Only the text field of each fact is counted. After reranking, facts are included in relevance order until this budget is exhausted. Set this to however much of your context window you want to allocate to memories.
query_timestamp
string
An ISO 8601 datetime representing when the query is being asked. Used as the anchor for resolving relative temporal expressions in the query (e.g. “last month”). Without it, the server’s current time is used. Most useful for replaying historical conversations or building time-anchored recall.
tags
string[]
Filters recall to only memories matching the specified tags. Applied at the database level across all four retrieval strategies, not as a post-processing step.
tags_match
string
default:"any"
Controls how tag filtering is applied. Accepted values:
ModeUntagged memoriesMatch condition
anyIncludedMemory has at least one of the specified tags
any_strictExcludedMemory has at least one of the specified tags
allIncludedMemory has all of the specified tags
all_strictExcludedMemory has all of the specified tags
tag_groups
object[]
Compound boolean tag filters. Groups in the list are AND-ed together at the top level. Each group is a recursive boolean expression: a leaf node {"tags": [...], "match": "..."}, or a compound node {"and": [...]}, {"or": [...]}, or {"not": ...}. Can be combined with tags and tags_match — they are AND-ed together.
include
object
Controls optional supplementary data returned alongside the main facts.
trace
boolean
default:"false"
When true, the response includes a detailed debug trace covering the query embedding, per-strategy retrieval results, RRF fusion candidates, reranked results, temporal constraints detected, and per-phase timings. Has no effect on retrieval logic.

Response fields

results
object[]
required
The main list of recalled facts, ordered by relevance. Results do not include a numeric score — what matters is the relative ordering, already reflected in list order.
source_facts
object
A dict keyed by fact ID containing full result objects for the source facts that contributed to observation results. Only present when include.source_facts is enabled. Facts are deduplicated across observations.
chunks
object
A dict keyed by chunk ID containing raw source text chunks. Only present when include.chunks is enabled. Each chunk has id, text, chunk_index, and truncated (whether the text was cut to fit the token budget).
entities
object
A dict keyed by canonical entity name containing entity state objects. Only present when include.entities is enabled. Each entry has entity_id, canonical_name, and observations.
trace
object
A debug object present only when trace: true was set in the request. Contains per-phase timings, retrieval breakdowns, and RRF fusion details.

Examples

Basic recall

curl -X POST "https://your-hindsight-host/v1/default/banks/my-bank/recall" \
  -H "Authorization: Bearer $HINDSIGHT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are Alice'\''s communication preferences?",
    "max_tokens": 2048
  }'

Recall with budget levels

response = client.recall(
    bank_id="my-bank",
    query="What tools does the team use?",
    budget="low",
    max_tokens=1024,
)

Tag-scoped recall

curl -X POST "https://your-hindsight-host/v1/default/banks/my-bank/recall" \
  -H "Authorization: Bearer $HINDSIGHT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are this user'\''s preferences?",
    "tags": ["user:alice"],
    "tags_match": "all_strict",
    "max_tokens": 2048
  }'

Recall observations with source facts

response = client.recall(
    bank_id="my-bank",
    query="What patterns have we observed in Alice's behavior?",
    types=["observation"],
    include_source_facts=True,
    max_tokens=4096,
)
for obs in response.results:
    print(obs.text)
    for fact_id in (obs.source_fact_ids or []):
        source = response.source_facts.get(fact_id)
        if source:
            print(f"  <- {source.text}")

Error codes

StatusCodeDescription
400invalid_requestMalformed request body or query exceeds 500 tokens.
401unauthorizedMissing or invalid API key.
404bank_not_foundThe specified bank does not exist.
422validation_errorOne or more parameters failed validation.
429rate_limitedToo many requests. Retry with exponential backoff.
500internal_errorServer error during retrieval.

Build docs developers (and LLMs) love