Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vectorize-io/hindsight/llms.txt

Use this file to discover all available pages before exploring further.

Hindsight uses three categories of machine learning models. LLMs handle fact extraction, reasoning, and generation. Embedding models convert text to vectors for semantic search. Cross-encoders rerank search results to improve precision.

LLM

Fact extraction, entity resolution, mental model consolidation, and answer synthesis. Fully configurable — 20+ providers supported.

Embeddings

Vector representations for semantic similarity search. Default: BAAI/bge-small-en-v1.5 (384 dimensions).

Reranker

Reranks initial search results to improve precision. Default: cross-encoder/ms-marco-MiniLM-L-6-v2.
Embedding and cross-encoder models are downloaded automatically from HuggingFace on first run.

LLM providers

Hindsight supports over 20 LLM providers out of the box, plus any OpenAI-compatible API and 100+ providers via LiteLLM.
ProviderValueNotes
OpenAIopenaiIncludes Flex Processing support
AnthropicanthropicClaude model family
GeminigeminiGoogle Gemini API
GroqgroqHigh-throughput inference — recommended for retain
OllamaollamaLocal inference server
LM StudiolmstudioLocal GUI-based inference
llama.cppllamacppBuilt-in managed subprocess, no external server
Vertex AIvertexaiGoogle Cloud, native GenAI SDK
AWS BedrockbedrockUses AWS credentials, no API key required
LiteLLMlitellm100+ providers via LiteLLM proxy
LiteLLM RouterlitellmrouterFallback chains, load-balancing, per-deployment limits
DeepSeekdeepseekOpenAI-compatible, includes thinking mode
MiniMaxminimax1M context window
OpenRouteropenrouterAccess 100+ models via one API key
z.aizaiZhipu GLM series
Volcano EnginevolcanoByteDance Doubao models
opencode-goopencode-goOpenAI-compatible
OpenAI Codexopenai-codexUses ChatGPT Plus/Pro OAuth (personal dev only)
Claude Codeclaude-codeUses Claude Pro/Max OAuth (personal dev only)
NonenoneDisable LLM — semantic search only
Hindsight works with any provider exposing an OpenAI-compatible API. Set HINDSIGHT_API_LLM_PROVIDER=openai and point HINDSIGHT_API_LLM_BASE_URL at your provider’s endpoint (Azure OpenAI, Together AI, Fireworks, etc.).

Benchmarks

Not sure which model to use? The Model Leaderboard benchmarks models across accuracy, speed, cost, and reliability for retain, reflect, and observation consolidation.

Tested models

The following models have been verified to work correctly with Hindsight:
ProviderModel
OpenAIgpt-5.2
OpenAIgpt-5
OpenAIgpt-5-mini
OpenAIgpt-5-nano
OpenAIgpt-4.1-mini
OpenAIgpt-4.1-nano
OpenAIgpt-4o-mini
Anthropicclaude-sonnet-4-20250514
Anthropicclaude-3-5-sonnet-20241022
Geminigemini-3-pro-preview
Geminigemini-2.5-flash
Geminigemini-2.5-flash-lite
Groqopenai/gpt-oss-120b
Groqopenai/gpt-oss-20b

Provider default models

When HINDSIGHT_API_LLM_MODEL is not set, each provider uses a sensible default. Setting just the provider is enough to get started:
# Uses claude-haiku-4-5-20251001 automatically
export HINDSIGHT_API_LLM_PROVIDER=anthropic
export HINDSIGHT_API_LLM_API_KEY=sk-ant-xxxxxxxxxxxx
You can always override the default by setting HINDSIGHT_API_LLM_MODEL explicitly:
export HINDSIGHT_API_LLM_PROVIDER=anthropic
export HINDSIGHT_API_LLM_API_KEY=sk-ant-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=claude-sonnet-4-5-20250929

Configuration examples

export HINDSIGHT_API_LLM_PROVIDER=groq
export HINDSIGHT_API_LLM_API_KEY=gsk_xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=openai/gpt-oss-20b

Models with limited output tokens

Hindsight requires at least 65,000 output tokens for reliable fact extraction. For models that support fewer tokens, reduce the retain completion token limit:
# For models supporting 32k output tokens
export HINDSIGHT_API_RETAIN_MAX_COMPLETION_TOKENS=32000

# For models supporting 16k output tokens
export HINDSIGHT_API_RETAIN_MAX_COMPLETION_TOKENS=16000
HINDSIGHT_API_RETAIN_MAX_COMPLETION_TOKENS must be greater than HINDSIGHT_API_RETAIN_CHUNK_SIZE (default: 3000).

OpenAI Codex setup (ChatGPT Plus/Pro)

Use your ChatGPT Plus or Pro subscription without separate API costs. For personal development only.
1

Install Codex CLI

npm install -g @openai/codex
2

Log in with ChatGPT credentials

codex auth login
This opens a browser window to authenticate and saves OAuth tokens to ~/.codex/auth.json.
3

Configure Hindsight

export HINDSIGHT_API_LLM_PROVIDER=openai-codex
# No API key needed — reads from ~/.codex/auth.json automatically

Claude Code setup (Claude Pro/Max)

Use your Claude Pro or Max subscription without separate API costs. For personal development only.
This integration uses your personal Claude credentials. Do not use it in production or shared environments. For production use, use HINDSIGHT_API_LLM_PROVIDER=anthropic with an API key.
1

Install Claude Code CLI

npm install -g @anthropics/claude-code
2

Log in with Claude credentials

claude auth login
3

Configure Hindsight

export HINDSIGHT_API_LLM_PROVIDER=claude-code
# No API key needed — uses claude auth login credentials

Embedding models

Embedding models convert text into dense vector representations for semantic similarity search. Default: BAAI/bge-small-en-v1.5 (384 dimensions, ~130 MB)

Supported providers

ProviderDescriptionBest for
localSentenceTransformers (default)Development, low latency
openaiOpenAI embeddings APIProduction, high quality
cohereCohere embeddings APIProduction, multilingual
googleGemini API or Vertex AIProduction, high quality
teiHuggingFace Text Embeddings InferenceProduction, self-hosted
litellmLiteLLM proxy (unified gateway)Multi-provider setups
litellm-sdkLiteLLM SDK (direct API, no proxy)Simpler multi-provider setup

Model reference

ModelDimensionsUse case
BAAI/bge-small-en-v1.5384Default, fast, good quality
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2384Multilingual (50+ languages)

Reranker models

The reranker improves precision by scoring the top candidates retrieved by semantic and keyword search. Default: cross-encoder/ms-marco-MiniLM-L-6-v2 (~85 MB)

Supported providers

ProviderDescriptionBest for
localSentenceTransformers CrossEncoder (default)Development, low latency
cohereCohere Rerank APIProduction, high quality
zeroentropyZeroEntropy zerank-2Production, state-of-the-art accuracy
siliconflowSiliconFlow Cohere-compatible endpointChina region, SiliconFlow platform
teiHuggingFace Text Embeddings InferenceProduction, self-hosted
flashrankFlashRank ONNX (lightweight, fast)Resource-constrained environments
litellmLiteLLM proxyMulti-provider setups
litellm-sdkLiteLLM SDK (direct API, no proxy)Simpler multi-provider setup
jina-mlxJina reranker v3, MLX (Apple Silicon)macOS with Apple Silicon
rrfRRF-only (no neural reranking)Testing, minimal resources

Model reference

ModelUse case
cross-encoder/ms-marco-MiniLM-L-6-v2Default, fast
cross-encoder/ms-marco-MiniLM-L-12-v2Higher accuracy
cross-encoder/mmarco-mMiniLMv2-L12-H384-v1Multilingual
For full configuration options including Azure-hosted endpoints and batch settings, see the Configuration reference.

Build docs developers (and LLMs) love