Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/sidmanale643/northstar/llms.txt

Use this file to discover all available pages before exploring further.

NorthStar’s trace store is built around five core Pydantic models: Session, Run, Span, Event, and Score. Every trace your agent produces maps to one or more of these models. Session, Run, and Span implement Python’s context manager protocol — open them with with, and the SDK automatically sets timestamps, captures exceptions, and flushes the record to the backend when the block exits.
from northstar import Session, Run, Span, Event, Score, SpanKind, RunStatus, CaptureOptions

Enums

RunStatus

RunStatus is a StrEnum that describes the terminal state of a Run or Span.
ValueStringMeaning
RunStatus.RUNNING"running"The run or span is still in progress
RunStatus.OK"ok"Completed successfully
RunStatus.ERROR"error"Terminated with an exception

SpanKind

SpanKind describes the semantic type of work a Span represents.
ValueStringMeaning
SpanKind.AGENT"agent"Top-level agent orchestration span
SpanKind.WORKFLOW"workflow"Multi-step workflow or pipeline
SpanKind.MODEL"model"LLM completion call
SpanKind.TOOL"tool"Tool or function call
SpanKind.CUSTOM"custom"User-defined span type

EventType

EventType classifies the semantic role of an Event within a run.
ValueStringMeaning
EventType.USER_INPUT"user_input"Raw user message or prompt
EventType.SYSTEM_MESSAGE"system_message"System prompt sent to a model
EventType.ASSISTANT_MESSAGE"assistant_message"Assistant turn content
EventType.REASONING"reasoning"Internal reasoning or chain-of-thought
EventType.TOOL_ARGUMENTS"tool_arguments"Arguments sent to a tool call
EventType.TOOL_RESULT"tool_result"Output returned by a tool
EventType.FINAL_RESPONSE"final_response"Agent’s final response to the user
EventType.CUSTOM"custom"User-defined event

Session

A Session is the top-level container for one or more Run objects. It typically maps to a user session, a conversation, or a single request lifecycle. Create sessions through Northstar.session(), never by instantiating Session directly.
with client.session(metadata={"user_id": "u_123"}) as session:
    with session.run("my-agent") as run:
        run.record_user_input("Hello!")
        run.record_final_response("Hi there!")
id
UUID
required
Auto-generated unique identifier for this session. Set to a random uuid4 by default.
project_id
UUID | None
Assigned by the backend during authenticated ingestion. None before the first flush — the ingest endpoint stamps the correct value.
created_at
datetime
UTC timestamp when the session was created. Defaults to the current UTC time.
ended_at
datetime | None
UTC timestamp set automatically when the session’s __exit__ method is called. None while the session is still open.
metadata
dict[str, Any]
Arbitrary key-value pairs attached to the session (e.g., user_id, environment, ab_group). Defaults to an empty dict.
Session.__exit__ sets ended_at, enqueues the session record, and calls client.flush() synchronously. If an exception propagated from inside the with block, it is re-raised after flushing.

Run

A Run represents a single agent execution inside a session — a turn, a job, or an end-to-end invocation. Create runs through Session.run(). The status field transitions automatically: it starts as RUNNING, becomes OK on clean exit, and becomes ERROR if an exception escapes the with block.
with session.run("research-agent", metadata={"query": "..."}) as run:
    run.record_user_input("What is the weather in Paris?")
    with run.span("search", kind=SpanKind.TOOL) as span:
        ...
    run.record_final_response("It is 22°C and sunny.")
id
UUID
required
Auto-generated unique identifier for this run.
session_id
UUID
required
The id of the parent Session. Set automatically when created via Session.run().
name
str
required
A human-readable label for this run, e.g. "research-agent" or "support-ticket-handler".
status
RunStatus
Current status of the run. Starts as RunStatus.RUNNING. Set to RunStatus.OK or RunStatus.ERROR on __exit__. Defaults to RunStatus.RUNNING.
error
dict[str, Any] | None
Populated automatically when an exception escapes the with block. Contains type, message, and module keys. None on success.
metadata
dict[str, Any]
Arbitrary key-value pairs. After the run exits, the SDK also writes aggregated cost_usd, total_input_tokens, and total_output_tokens into metadata if any child model spans recorded usage. Defaults to an empty dict.
started_at
datetime
UTC timestamp set when the Run object is created. Defaults to the current UTC time.
ended_at
datetime | None
UTC timestamp set automatically on __exit__. None while the run is still in progress.

Span

A Span is a nestable unit of work inside a Run. Spans can represent a model call, a tool invocation, a retrieval step, or any custom segment. They form a parent-child tree: the SDK tracks the currently active span per run, so nested with run.span(...) blocks automatically set parent_span_id.
with run.span("retrieve-docs", kind=SpanKind.TOOL) as span:
    span.record_tool_arguments({"query": "Paris weather"})
    results = vector_db.search("Paris weather")
    span.record_tool_result(results)
id
UUID
required
Auto-generated unique identifier for this span.
run_id
UUID
required
The id of the parent Run. Set automatically when created via Run.span().
parent_span_id
UUID | None
The id of the enclosing span, if any. Set automatically based on the active span stack. None for top-level spans.
kind
SpanKind
required
Semantic type of this span. Must be one of SpanKind.AGENT, SpanKind.WORKFLOW, SpanKind.MODEL, SpanKind.TOOL, or SpanKind.CUSTOM.
name
str
required
Human-readable label for this span, e.g. "retrieve-docs" or "gpt-4o-call".
status
RunStatus
Starts as RunStatus.RUNNING. Set to RunStatus.OK on clean exit or RunStatus.ERROR if an exception escapes. Defaults to RunStatus.RUNNING.
error
dict[str, Any] | None
Exception metadata (type, message, module) captured automatically on error. None on success.
iteration
int | None
Optional loop iteration counter. Useful when a span is created inside an agentic loop to distinguish iteration 0, 1, 2, etc. Defaults to None.
attributes
dict[str, Any]
Freeform key-value attributes. For model spans, the SDK populates model, input_tokens, output_tokens, total_tokens, cost_usd, and pricing_source automatically. Defaults to an empty dict.
started_at
datetime
UTC timestamp set when the Span object is created.
ended_at
datetime | None
UTC timestamp set automatically on __exit__. None while the span is open.

Event

An Event is an immutable log entry scoped to a Run and optionally to a Span. Events are created by calling record_* methods on a Run or Span. Which event types are actually stored depends on the CaptureOptions configured on the client.
id
UUID
required
Auto-generated unique identifier for this event.
run_id
UUID
required
The id of the parent Run.
span_id
UUID | None
The id of the enclosing Span, if the event was created from a span context. None for run-level events.
type
EventType
required
The semantic category of this event. One of the EventType enum values listed above.
content
Any
required
The payload of the event. May be a string, dict, list, or any JSON-serializable value.
attributes
dict[str, Any]
Additional structured metadata. For tool events, the SDK adds tool_call_id and name automatically. Defaults to an empty dict.
created_at
datetime
UTC timestamp set when the event is created.

Score

A Score attaches a numeric, boolean, or categorical quality signal to a run or span. Scores are created via client.score() — see the Scores reference for the full API.
id
UUID
Auto-generated unique identifier for this score.
trace_id
UUID
required
The id of the Run (trace) this score is attached to.
span_id
UUID | None
Optionally scope the score to a specific Span. None scopes it to the whole run.
name
str
required
A non-blank label for the score, e.g. "relevance", "faithfulness", or "latency_ok".
value
float
required
The numeric value. For boolean scores, must be 0.0 (false) or 1.0 (true). For categorical scores, always stored as 0.0; the human-readable label is in string_value.
data_type
"numeric" | "categorical" | "boolean"
required
Inferred automatically from the Python type of the value passed to client.score(). Determines how the dashboard renders the score.
string_value
str | None
The string label for categorical scores (e.g. "thumbs_up"). Required when data_type == "categorical", forbidden otherwise.
source
"api"
Always "api" for scores created through the Python SDK. Scores generated by EvalSuite are also submitted through the API.
comment
str | None
Optional free-text note attached to this score, e.g. an annotator’s rationale.
created_at
datetime
UTC timestamp set when the score is created.

CaptureOptions

CaptureOptions controls which event types the SDK records and ships. Pass it to Northstar(capture=...) at init time. All fields default to False — opting in is explicit, so sensitive content is never captured accidentally.
from northstar import CaptureOptions, Northstar

client = Northstar(
    api_key="ns_...",
    project_id="<project-ref>",
    capture=CaptureOptions(
        user_input=True,
        final_response=True,
        tool_arguments=True,
        tool_results=True,
    ),
)
user_input
bool
When True, USER_INPUT events are recorded. Default: False.
system_messages
bool
When True, SYSTEM_MESSAGE events are recorded. Default: False.
assistant_messages
bool
When True, ASSISTANT_MESSAGE events are recorded. Default: False.
reasoning
bool
When True, REASONING events (chain-of-thought, scratchpad) are recorded. Default: False.
tool_arguments
bool
When True, TOOL_ARGUMENTS events are recorded. Default: False.
tool_results
bool
When True, TOOL_RESULT events are recorded. Default: False.
final_response
bool
When True, FINAL_RESPONSE events are recorded. Default: False.
CaptureOptions defaults all fields to False. You must explicitly enable each event type you want persisted. This is intentional — tool results and user messages may contain PII or secrets.

Context manager lifecycle

Session, Run, and Span all implement __enter__ / __exit__. The pattern is always:
with client.session() as session:          # __enter__: returns self
    with session.run("agent") as run:      # session.run() creates & enqueues the run; __enter__ returns self
        with run.span("tool", kind=SpanKind.TOOL) as span:  # pushes onto active stack
            ...
        # __exit__: span ended_at set, status = OK or ERROR, enqueued
    # __exit__: run ended_at set, model costs aggregated, re-enqueued
# __exit__: session ended_at set, flush() called
If an exception escapes any context manager, the SDK sets status = RunStatus.ERROR, populates the error dict, and re-raises the exception after flushing. Application control flow is never silently swallowed.

Build docs developers (and LLMs) love