NorthStar is an observability, debugging, and evaluation platform built specifically for AI agents. It records traces, child spans, events, metrics, errors, and LLM cost — all without changing your application’s control flow. Whether you’re running a simple question-answering bot or a complex multi-step research agent, NorthStar gives you the visibility you need to understand what your agent is doing, catch regressions early, and evaluate output quality over time.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/sidmanale643/northstar/llms.txt
Use this file to discover all available pages before exploring further.
Data Model
NorthStar organizes all observability data into a four-level hierarchy. Each level is a context manager whose lifecycle is managed automatically, so your instrumentation code stays clean.| Entity | Description | Key Fields |
|---|---|---|
| Session | Top-level user tracking session | id, project_id, created_at, metadata |
| Run | Agent run or step inside a session | id, session_id, name, status, error, metadata |
| Span | Child span inside a run (nestable) | id, run_id, parent_span_id, kind, name, attributes |
| Event | Individual trace event | id, run_id, span_id, type, content, attributes |
| Score | Eval score attached to a run | run_id, name, value, data_type, source |
Data Flow
Every piece of data collected by the SDK follows the same path from your application to the dashboard:httpx with bounded retries on transient HTTP errors (408 / 429 / 500 / 502 / 503 / 504). The Supabase Edge Function validates every payload, authenticates the request against a SHA-256-hashed API key, stamps the project_id on every record, and calls private.ingest_batch() in Postgres. All tables are protected by Row Level Security for multi-tenant isolation.
Architecture
The full component map for the SDK and backend looks like this:Key Capabilities
Auto-Instrumentation
A single
northstar.auto_instrument() call patches OpenAI and Anthropic clients to capture messages, tool calls, token usage, USD cost, latency, and exceptions — no per-call code changes needed.Distributed Tracing
@northstar.trace and @northstar.observe decorators (plus context manager forms) let you nest spans arbitrarily deep. ContextVar propagation ensures correct parent-child linking across async and threaded code.Versioned Prompts
Store prompt templates server-side and retrieve them with
client.pull_prompt() on the low-level Northstar client. Compile templates with Jinja or Python-style variables, and bind compiled versions directly to model call spans for full prompt lineage.Evaluations
northstar.evals provides dataset loaders, deterministic graders (output, tool_sequence, retrieval, regex, python_code, and more), and LLM-judge rubrics for systematic agent evaluation.LLM Cost Tracking
Install the
pricing extra to get per-call token counting and USD cost via LiteLLM pricing tables. Cost is recorded on every model_call span and surfaced in run metadata.Dashboard
A Next.js web dashboard visualises sessions, runs, spans, events, and eval scores. Project-scoped provider keys let dashboard rubric evals call OpenAI, Anthropic, or OpenRouter without exposing credentials client-side.
No-Op Fallback
NorthStar is designed to never crash your application. When the SDK is disabled (viaNORTHSTAR_ENABLED=false), when credentials are missing, or when the ingest endpoint is unreachable, all tracing calls silently become no-ops. Your agent continues to run normally. Enable debug=True (or set NORTHSTAR_DEBUG=true) to print SDK warnings to stderr so you can confirm the SDK is active during development.
northstar.current_trace_id() at any point to retrieve the active run ID and correlate your application logs with NorthStar traces.
Get started in 5 minutes →
Install the SDK, set your credentials, and trace your first agent run.