Hindsight uses three categories of machine learning models. LLMs handle fact extraction, reasoning, and generation. Embedding models convert text to vectors for semantic search. Cross-encoders rerank search results to improve precision.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/vectorize-io/hindsight/llms.txt
Use this file to discover all available pages before exploring further.
LLM
Fact extraction, entity resolution, mental model consolidation, and answer synthesis. Fully configurable — 20+ providers supported.
Embeddings
Vector representations for semantic similarity search. Default:
BAAI/bge-small-en-v1.5 (384 dimensions).Reranker
Reranks initial search results to improve precision. Default:
cross-encoder/ms-marco-MiniLM-L-6-v2.LLM providers
Hindsight supports over 20 LLM providers out of the box, plus any OpenAI-compatible API and 100+ providers via LiteLLM.| Provider | Value | Notes |
|---|---|---|
| OpenAI | openai | Includes Flex Processing support |
| Anthropic | anthropic | Claude model family |
| Gemini | gemini | Google Gemini API |
| Groq | groq | High-throughput inference — recommended for retain |
| Ollama | ollama | Local inference server |
| LM Studio | lmstudio | Local GUI-based inference |
| llama.cpp | llamacpp | Built-in managed subprocess, no external server |
| Vertex AI | vertexai | Google Cloud, native GenAI SDK |
| AWS Bedrock | bedrock | Uses AWS credentials, no API key required |
| LiteLLM | litellm | 100+ providers via LiteLLM proxy |
| LiteLLM Router | litellmrouter | Fallback chains, load-balancing, per-deployment limits |
| DeepSeek | deepseek | OpenAI-compatible, includes thinking mode |
| MiniMax | minimax | 1M context window |
| OpenRouter | openrouter | Access 100+ models via one API key |
| z.ai | zai | Zhipu GLM series |
| Volcano Engine | volcano | ByteDance Doubao models |
| opencode-go | opencode-go | OpenAI-compatible |
| OpenAI Codex | openai-codex | Uses ChatGPT Plus/Pro OAuth (personal dev only) |
| Claude Code | claude-code | Uses Claude Pro/Max OAuth (personal dev only) |
| None | none | Disable LLM — semantic search only |
Benchmarks
Not sure which model to use? The Model Leaderboard benchmarks models across accuracy, speed, cost, and reliability for retain, reflect, and observation consolidation.Tested models
The following models have been verified to work correctly with Hindsight:| Provider | Model |
|---|---|
| OpenAI | gpt-5.2 |
| OpenAI | gpt-5 |
| OpenAI | gpt-5-mini |
| OpenAI | gpt-5-nano |
| OpenAI | gpt-4.1-mini |
| OpenAI | gpt-4.1-nano |
| OpenAI | gpt-4o-mini |
| Anthropic | claude-sonnet-4-20250514 |
| Anthropic | claude-3-5-sonnet-20241022 |
| Gemini | gemini-3-pro-preview |
| Gemini | gemini-2.5-flash |
| Gemini | gemini-2.5-flash-lite |
| Groq | openai/gpt-oss-120b |
| Groq | openai/gpt-oss-20b |
Provider default models
WhenHINDSIGHT_API_LLM_MODEL is not set, each provider uses a sensible default. Setting just the provider is enough to get started:
HINDSIGHT_API_LLM_MODEL explicitly:
Configuration examples
Models with limited output tokens
Hindsight requires at least 65,000 output tokens for reliable fact extraction. For models that support fewer tokens, reduce the retain completion token limit:HINDSIGHT_API_RETAIN_MAX_COMPLETION_TOKENS must be greater than HINDSIGHT_API_RETAIN_CHUNK_SIZE (default: 3000).
OpenAI Codex setup (ChatGPT Plus/Pro)
Use your ChatGPT Plus or Pro subscription without separate API costs. For personal development only.Log in with ChatGPT credentials
~/.codex/auth.json.Claude Code setup (Claude Pro/Max)
Use your Claude Pro or Max subscription without separate API costs. For personal development only.Embedding models
Embedding models convert text into dense vector representations for semantic similarity search. Default:BAAI/bge-small-en-v1.5 (384 dimensions, ~130 MB)
Supported providers
| Provider | Description | Best for |
|---|---|---|
local | SentenceTransformers (default) | Development, low latency |
openai | OpenAI embeddings API | Production, high quality |
cohere | Cohere embeddings API | Production, multilingual |
google | Gemini API or Vertex AI | Production, high quality |
tei | HuggingFace Text Embeddings Inference | Production, self-hosted |
litellm | LiteLLM proxy (unified gateway) | Multi-provider setups |
litellm-sdk | LiteLLM SDK (direct API, no proxy) | Simpler multi-provider setup |
Model reference
- Local
- OpenAI
- Google
- Cohere
| Model | Dimensions | Use case |
|---|---|---|
BAAI/bge-small-en-v1.5 | 384 | Default, fast, good quality |
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | 384 | Multilingual (50+ languages) |
Reranker models
The reranker improves precision by scoring the top candidates retrieved by semantic and keyword search. Default:cross-encoder/ms-marco-MiniLM-L-6-v2 (~85 MB)
Supported providers
| Provider | Description | Best for |
|---|---|---|
local | SentenceTransformers CrossEncoder (default) | Development, low latency |
cohere | Cohere Rerank API | Production, high quality |
zeroentropy | ZeroEntropy zerank-2 | Production, state-of-the-art accuracy |
siliconflow | SiliconFlow Cohere-compatible endpoint | China region, SiliconFlow platform |
tei | HuggingFace Text Embeddings Inference | Production, self-hosted |
flashrank | FlashRank ONNX (lightweight, fast) | Resource-constrained environments |
litellm | LiteLLM proxy | Multi-provider setups |
litellm-sdk | LiteLLM SDK (direct API, no proxy) | Simpler multi-provider setup |
jina-mlx | Jina reranker v3, MLX (Apple Silicon) | macOS with Apple Silicon |
rrf | RRF-only (no neural reranking) | Testing, minimal resources |
Model reference
- Local
- Cohere
- ZeroEntropy
- SiliconFlow
| Model | Use case |
|---|---|
cross-encoder/ms-marco-MiniLM-L-6-v2 | Default, fast |
cross-encoder/ms-marco-MiniLM-L-12-v2 | Higher accuracy |
cross-encoder/mmarco-mMiniLMv2-L12-H384-v1 | Multilingual |
