NorthStar’s trace store is built around five core Pydantic models: Session, Run, Span, Event, and Score. Every trace your agent produces maps to one or more of these models.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/sidmanale643/northstar/llms.txt
Use this file to discover all available pages before exploring further.
Session, Run, and Span implement Python’s context manager protocol — open them with with, and the SDK automatically sets timestamps, captures exceptions, and flushes the record to the backend when the block exits.
Enums
RunStatus
RunStatus is a StrEnum that describes the terminal state of a Run or Span.
| Value | String | Meaning |
|---|---|---|
RunStatus.RUNNING | "running" | The run or span is still in progress |
RunStatus.OK | "ok" | Completed successfully |
RunStatus.ERROR | "error" | Terminated with an exception |
SpanKind
SpanKind describes the semantic type of work a Span represents.
| Value | String | Meaning |
|---|---|---|
SpanKind.AGENT | "agent" | Top-level agent orchestration span |
SpanKind.WORKFLOW | "workflow" | Multi-step workflow or pipeline |
SpanKind.MODEL | "model" | LLM completion call |
SpanKind.TOOL | "tool" | Tool or function call |
SpanKind.CUSTOM | "custom" | User-defined span type |
EventType
EventType classifies the semantic role of an Event within a run.
| Value | String | Meaning |
|---|---|---|
EventType.USER_INPUT | "user_input" | Raw user message or prompt |
EventType.SYSTEM_MESSAGE | "system_message" | System prompt sent to a model |
EventType.ASSISTANT_MESSAGE | "assistant_message" | Assistant turn content |
EventType.REASONING | "reasoning" | Internal reasoning or chain-of-thought |
EventType.TOOL_ARGUMENTS | "tool_arguments" | Arguments sent to a tool call |
EventType.TOOL_RESULT | "tool_result" | Output returned by a tool |
EventType.FINAL_RESPONSE | "final_response" | Agent’s final response to the user |
EventType.CUSTOM | "custom" | User-defined event |
Session
ASession is the top-level container for one or more Run objects. It typically maps to a user session, a conversation, or a single request lifecycle. Create sessions through Northstar.session(), never by instantiating Session directly.
Auto-generated unique identifier for this session. Set to a random
uuid4 by default.Assigned by the backend during authenticated ingestion.
None before the first flush — the ingest endpoint stamps the correct value.UTC timestamp when the session was created. Defaults to the current UTC time.
UTC timestamp set automatically when the session’s
__exit__ method is called. None while the session is still open.Arbitrary key-value pairs attached to the session (e.g.,
user_id, environment, ab_group). Defaults to an empty dict.Session.__exit__ sets ended_at, enqueues the session record, and calls client.flush() synchronously. If an exception propagated from inside the with block, it is re-raised after flushing.Run
ARun represents a single agent execution inside a session — a turn, a job, or an end-to-end invocation. Create runs through Session.run(). The status field transitions automatically: it starts as RUNNING, becomes OK on clean exit, and becomes ERROR if an exception escapes the with block.
Auto-generated unique identifier for this run.
The
id of the parent Session. Set automatically when created via Session.run().A human-readable label for this run, e.g.
"research-agent" or "support-ticket-handler".Current status of the run. Starts as
RunStatus.RUNNING. Set to RunStatus.OK or RunStatus.ERROR on __exit__. Defaults to RunStatus.RUNNING.Populated automatically when an exception escapes the
with block. Contains type, message, and module keys. None on success.Arbitrary key-value pairs. After the run exits, the SDK also writes aggregated
cost_usd, total_input_tokens, and total_output_tokens into metadata if any child model spans recorded usage. Defaults to an empty dict.UTC timestamp set when the
Run object is created. Defaults to the current UTC time.UTC timestamp set automatically on
__exit__. None while the run is still in progress.Span
ASpan is a nestable unit of work inside a Run. Spans can represent a model call, a tool invocation, a retrieval step, or any custom segment. They form a parent-child tree: the SDK tracks the currently active span per run, so nested with run.span(...) blocks automatically set parent_span_id.
Auto-generated unique identifier for this span.
The
id of the parent Run. Set automatically when created via Run.span().The
id of the enclosing span, if any. Set automatically based on the active span stack. None for top-level spans.Semantic type of this span. Must be one of
SpanKind.AGENT, SpanKind.WORKFLOW, SpanKind.MODEL, SpanKind.TOOL, or SpanKind.CUSTOM.Human-readable label for this span, e.g.
"retrieve-docs" or "gpt-4o-call".Starts as
RunStatus.RUNNING. Set to RunStatus.OK on clean exit or RunStatus.ERROR if an exception escapes. Defaults to RunStatus.RUNNING.Exception metadata (
type, message, module) captured automatically on error. None on success.Optional loop iteration counter. Useful when a span is created inside an agentic loop to distinguish iteration
0, 1, 2, etc. Defaults to None.Freeform key-value attributes. For model spans, the SDK populates
model, input_tokens, output_tokens, total_tokens, cost_usd, and pricing_source automatically. Defaults to an empty dict.UTC timestamp set when the
Span object is created.UTC timestamp set automatically on
__exit__. None while the span is open.Event
AnEvent is an immutable log entry scoped to a Run and optionally to a Span. Events are created by calling record_* methods on a Run or Span. Which event types are actually stored depends on the CaptureOptions configured on the client.
Auto-generated unique identifier for this event.
The
id of the parent Run.The
id of the enclosing Span, if the event was created from a span context. None for run-level events.The semantic category of this event. One of the
EventType enum values listed above.The payload of the event. May be a string, dict, list, or any JSON-serializable value.
Additional structured metadata. For tool events, the SDK adds
tool_call_id and name automatically. Defaults to an empty dict.UTC timestamp set when the event is created.
Score
AScore attaches a numeric, boolean, or categorical quality signal to a run or span. Scores are created via client.score() — see the Scores reference for the full API.
Auto-generated unique identifier for this score.
The
id of the Run (trace) this score is attached to.Optionally scope the score to a specific
Span. None scopes it to the whole run.A non-blank label for the score, e.g.
"relevance", "faithfulness", or "latency_ok".The numeric value. For
boolean scores, must be 0.0 (false) or 1.0 (true). For categorical scores, always stored as 0.0; the human-readable label is in string_value.Inferred automatically from the Python type of the value passed to
client.score(). Determines how the dashboard renders the score.The string label for
categorical scores (e.g. "thumbs_up"). Required when data_type == "categorical", forbidden otherwise.Always
"api" for scores created through the Python SDK. Scores generated by EvalSuite are also submitted through the API.Optional free-text note attached to this score, e.g. an annotator’s rationale.
UTC timestamp set when the score is created.
CaptureOptions
CaptureOptions controls which event types the SDK records and ships. Pass it to Northstar(capture=...) at init time. All fields default to False — opting in is explicit, so sensitive content is never captured accidentally.
When
True, USER_INPUT events are recorded. Default: False.When
True, SYSTEM_MESSAGE events are recorded. Default: False.When
True, ASSISTANT_MESSAGE events are recorded. Default: False.When
True, REASONING events (chain-of-thought, scratchpad) are recorded. Default: False.When
True, TOOL_ARGUMENTS events are recorded. Default: False.When
True, TOOL_RESULT events are recorded. Default: False.When
True, FINAL_RESPONSE events are recorded. Default: False.Context manager lifecycle
Session, Run, and Span all implement __enter__ / __exit__. The pattern is always:
status = RunStatus.ERROR, populates the error dict, and re-raises the exception after flushing. Application control flow is never silently swallowed.