Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/sidmanale643/northstar/llms.txt

Use this file to discover all available pages before exploring further.

The high-level northstar.init() API is designed for the common case: one long-lived session per process, runs created automatically by @northstar.trace, and a shared background worker. The low-level Northstar client gives you direct control over every layer of the hierarchy. Use it when you need to manage session boundaries explicitly, attach evaluation scores to individual runs, customize which event types are captured, or integrate NorthStar into a framework that manages its own execution lifecycle.

Instantiating the Client

from northstar import CaptureOptions, Northstar, SpanKind

client = Northstar(
    api_key="ns_...",
    project_id="<supabase-project-ref>",
    capture=CaptureOptions(
        user_input=True,
        final_response=True,
    ),
)
The endpoint parameter can be used instead of project_id for self-hosted deployments:
client = Northstar(
    api_key="ns_...",
    endpoint="https://my-server.example.com/ingest",
    capture=CaptureOptions(user_input=True, final_response=True),
)

CaptureOptions

CaptureOptions controls which event types are actually written to the queue. All fields default to False when constructing the low-level client directly, giving you explicit opt-in control over what is stored.
user_input
bool
default:"false"
Capture USER_INPUT events — the initial message or prompt provided to the agent.
system_messages
bool
default:"false"
Capture SYSTEM_MESSAGE events from LLM request payloads.
assistant_messages
bool
default:"false"
Capture ASSISTANT_MESSAGE events — intermediate assistant responses within a conversation.
reasoning
bool
default:"false"
Capture REASONING events — model reasoning or chain-of-thought content when available.
tool_arguments
bool
default:"false"
Capture TOOL_ARGUMENTS events — the input arguments passed to each tool call.
tool_results
bool
default:"false"
Capture TOOL_RESULT events — the content returned by tool executions.
final_response
bool
default:"false"
Capture FINAL_RESPONSE events — the terminal response delivered back to the user.
When you use northstar.init(), all seven flags are set to True automatically.

Full Low-Level Example

The following example demonstrates a complete run with user input, a tool span, a final response, and an attached evaluation score:
from northstar import CaptureOptions, Northstar, SpanKind

client = Northstar(
    api_key="ns_...",
    project_id="<project-ref>",
    capture=CaptureOptions(user_input=True, final_response=True),
)

with client.session(metadata={"source": "cli"}) as session:
    with session.run("research-agent") as run:
        run.record_user_input("Find the current API documentation.")

        with run.span("search-docs", kind=SpanKind.TOOL):
            # ... your tool logic here ...
            pass

        run.record_final_response("Documentation found.")
        client.score(run.id, "relevance", 0.92)
Session, Run, and Span are context managers. Their __exit__ methods stamp ended_at, set the final status (ok or error), and enqueue the record for flushing. When the outermost session context exits, it calls client.flush() synchronously to drain the queue.

Attaching Scores

client.score() attaches an evaluation score to any run. The score type is inferred from the Python type of value:
client.score(
    run.id,        # str | UUID — the run to attach the score to
    "relevance",   # name
    0.92,          # float → numeric score
)

# Categorical score (string value)
client.score(run.id, "quality", "good")

# Boolean score
client.score(run.id, "answered", True)
Scores can also be attached to a specific span by passing span_id=span.id.

Recording Run Events

The Run object exposes typed recording methods that respect the CaptureOptions flags configured on the client:
MethodEvent type written
run.record_user_input(content)USER_INPUT
run.record_system_message(content)SYSTEM_MESSAGE
run.record_assistant_message(content)ASSISTANT_MESSAGE
run.record_final_response(content)FINAL_RESPONSE
run.record_tool_result(content)TOOL_RESULT
run.record_custom_event(content)CUSTOM
run.record_error(exc)sets status=ERROR and error field

Session Lifecycle

Under the high-level northstar.init() API, one session is created lazily the first time a @northstar.trace or with northstar.trace(...) block is opened. That session is reused for all subsequent traces in the same process. It is finalized (and its ended_at timestamp is set) when northstar.shutdown() is called — either explicitly or via the atexit hook registered by the SDK. Under the low-level client, you control session boundaries entirely via with client.session(...) as session:. Multiple sessions can be opened sequentially; each session gets its own UUID and metadata.

Trace Replay

run.replay(tools=...) lets you re-execute a previously recorded run against a tool registry, replaying the exact tool call sequence deterministically:
tool_registry = {
    "search_docs": search_docs,
    "fetch_url": fetch_url,
}

with session.run("research-agent") as run:
    # ... run your agent ...
    replay = run.replay(tools=tool_registry)
This is useful for regression testing: record a golden run once, then replay it against a new version of your tools to verify the outputs have not changed.

Lifecycle and Threading

The low-level Northstar client does not start a background worker thread on its own. Flushing is triggered explicitly:
  • client.flush() — builds the current batch payload and POSTs it synchronously. Returns the payload dict. Clears all pending queues on success.
  • client.aflush() — async variant of flush() for use in asyncio contexts.
The high-level _SDKState layer (used by northstar.init()) adds the background daemon thread. Its configuration parameters map directly to northstar.init() arguments:
ParameterDefaultDescription
batch_size50Flush when the pending record count reaches this threshold
flush_interval5.0Maximum seconds between background flushes
max_queue_size1000Records beyond this limit are dropped with a debug warning
The background thread is a daemon thread (daemon=True), so it does not prevent the Python interpreter from exiting. Call northstar.flush() explicitly at the end of short-lived scripts to ensure all queued data is sent before the process exits.

flush() and shutdown()

# Drain the queue synchronously (returns True on success)
northstar.flush()

# Drain with a timeout (raises ValueError if timeout <= 0)
northstar.flush(timeout=5.0)
northstar.shutdown() is registered as an atexit handler automatically. It stops the background worker, finalizes the current session, and performs a final flush. You do not need to call it manually in most applications.
In test suites, call northstar.flush() after each test case to ensure all spans from that test are sent before the next test starts — especially important when tests share a single northstar.init() call.

Build docs developers (and LLMs) love