Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/sidmanale643/northstar/llms.txt

Use this file to discover all available pages before exploring further.

northstar.model_call() is a context manager that opens a SpanKind.MODEL span around an LLM provider call and yields a ModelSpan handle you use to record the conversation, token usage, and cost. It is the recommended way to manually instrument LLM calls when auto-instrumentation via auto_instrument() is not available or not desired. All four methods on ModelSpan are safe to call in any order; the span is finalized — with latency and error status — when the with block exits.

Function signature

@contextmanager
def model_call(
    name: str,
    *,
    model: str,
    run: Run | None = None,
) -> Iterator[ModelSpan | _NoopModelSpan]

Parameters

name
str
required
Display name for the model span shown in the trace waterfall. Use a descriptive label for the role this LLM call plays, e.g. "answer-llm", "summarizer", or "classify-intent".
model
str
required
The model identifier passed to the provider, e.g. "gpt-4o", "claude-sonnet-4-5-20250929", "gemini-1.5-pro". This string is stored in span.attributes["model"] and used by the pricing module to look up per-token USD rates when record_usage() is called.
run
Run | None
default:"None"
Attach this model span to a specific Run object instead of the currently active trace. Useful in low-level code that holds a Run reference directly. When None (the default), the span is attached to the active trace detected from the ContextVar.

ModelSpan methods

The context manager yields a ModelSpan instance. All methods delegate to the underlying Span record and enqueue updates to the SDK’s background worker.
record_input_messages(messages)
method → int
Records the list of messages sent to the model. Each message dict is parsed by role ("system", "user", "assistant", "tool") and stored as the corresponding event type. Returns the estimated prompt token count for the model (computed via the litellm pricing module; requires the pricing extra).
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": query},
]
llm.record_input_messages(messages)
record_output_message(message)
method → int
Records the message returned by the model. Pass the response message dict (e.g. response.choices[0].message.model_dump()). Returns the estimated completion token count. If input tokens were already recorded via record_input_messages(), the method also computes and stores the estimated USD cost in span.attributes["cost_usd"].
llm.record_output_message(response.choices[0].message.model_dump())
record_usage(*, prompt_tokens, completion_tokens, source)
method → float | None
Records exact token counts reported by the provider (from the API response’s usage object) and computes the USD cost. Overwrites any estimates written by record_input_messages() / record_output_message(). Returns the computed USD cost as a float, or None if the model is not found in the pricing database.
  • prompt_tokens (int, required) — tokens consumed by the prompt.
  • completion_tokens (int, required) — tokens generated by the model.
  • source (str, default "litellm") — pricing source label stored in span.attributes["pricing_source"].
cost = llm.record_usage(
    prompt_tokens=response.usage.prompt_tokens,
    completion_tokens=response.usage.completion_tokens,
)
id
UUID | None
The UUID of the underlying Span record. None when the SDK is in no-op mode.
model
str
The model identifier passed to model_call(). Empty string in no-op mode.

Full usage example

import northstar

@northstar.trace("answer-agent")
def run_agent(query: str) -> str:
    messages = [
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": query},
    ]

    with northstar.model_call("answer-llm", model="gpt-4o") as llm:
        llm.record_input_messages(messages)

        response = openai_client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
        )

        llm.record_output_message(response.choices[0].message.model_dump())
        cost = llm.record_usage(
            prompt_tokens=response.usage.prompt_tokens,
            completion_tokens=response.usage.completion_tokens,
        )

    return response.choices[0].message.content

Multi-turn example

with northstar.model_call("step-1", model="gpt-4o-mini") as llm:
    llm.record_input_messages(first_turn_messages)
    r1 = client.chat.completions.create(model="gpt-4o-mini", messages=first_turn_messages)
    llm.record_output_message(r1.choices[0].message.model_dump())
    llm.record_usage(
        prompt_tokens=r1.usage.prompt_tokens,
        completion_tokens=r1.usage.completion_tokens,
    )

# second turn — a new model_call span
with northstar.model_call("step-2", model="gpt-4o") as llm:
    llm.record_input_messages(second_turn_messages)
    r2 = client.chat.completions.create(model="gpt-4o", messages=second_turn_messages)
    llm.record_output_message(r2.choices[0].message.model_dump())
    llm.record_usage(
        prompt_tokens=r2.usage.prompt_tokens,
        completion_tokens=r2.usage.completion_tokens,
    )

No-op behavior

When there is no active trace, or when the SDK is disabled, model_call() yields a _NoopModelSpan. All methods on _NoopModelSpan accept the same arguments and silently return 0 (for token counts) or None (for cost), so your code path does not need to branch.

Cost aggregation

After the run finishes, NorthStar automatically aggregates cost_usd, input_tokens, and output_tokens across all MODEL-kind spans and writes totals to the run’s metadata (cost_usd, total_input_tokens, total_output_tokens). These totals appear in the dashboard’s run summary view.

Relationship to auto_instrument()

northstar.auto_instrument() uses model_call() internally. When auto-instrumentation is active for OpenAI or Anthropic, every provider call is automatically wrapped in a model_call() span — you do not need to call it manually. Use model_call() directly when you use a provider not yet covered by auto-instrumentation, or when you need fine-grained control over what is recorded.
Install the pricing extra (uv add 'northstar-ai[pricing]') to enable token counting and USD cost computation. Without it, record_usage() still stores the raw token counts but returns None for cost, and record_input_messages() / record_output_message() return 0.

Build docs developers (and LLMs) love