northstar.model_call() — Record LLM Calls and Token Usage

northstar.model_call() is a context manager that opens a SpanKind.MODEL span around an LLM provider call and yields a ModelSpan handle you use to record the conversation, token usage, and cost. It is the recommended way to manually instrument LLM calls when auto-instrumentation via auto_instrument() is not available or not desired. All four methods on ModelSpan are safe to call in any order; the span is finalized — with latency and error status — when the with block exits.

Function signature

@contextmanager
def model_call(
    name: str,
    *,
    model: str,
    run: Run | None = None,
) -> Iterator[ModelSpan | _NoopModelSpan]

Parameters

name

str

required

Display name for the model span shown in the trace waterfall. Use a descriptive label for the role this LLM call plays, e.g. "answer-llm", "summarizer", or "classify-intent".

model

str

required

The model identifier passed to the provider, e.g. "gpt-4o", "claude-sonnet-4-5-20250929", "gemini-1.5-pro". This string is stored in span.attributes["model"] and used by the pricing module to look up per-token USD rates when record_usage() is called.

run

Run | None

default:"None"

Attach this model span to a specific Run object instead of the currently active trace. Useful in low-level code that holds a Run reference directly. When None (the default), the span is attached to the active trace detected from the ContextVar.

`ModelSpan` methods

The context manager yields a ModelSpan instance. All methods delegate to the underlying Span record and enqueue updates to the SDK’s background worker.

record_input_messages(messages)

method → int

Records the list of messages sent to the model. Each message dict is parsed by role ("system", "user", "assistant", "tool") and stored as the corresponding event type. Returns the estimated prompt token count for the model (computed via the litellm pricing module; requires the pricing extra).

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": query},
]
llm.record_input_messages(messages)

record_output_message(message)

method → int

Records the message returned by the model. Pass the response message dict (e.g. response.choices[0].message.model_dump()). Returns the estimated completion token count. If input tokens were already recorded via record_input_messages(), the method also computes and stores the estimated USD cost in span.attributes["cost_usd"].

llm.record_output_message(response.choices[0].message.model_dump())

record_usage(*, prompt_tokens, completion_tokens, source)

method → float | None

Records exact token counts reported by the provider (from the API response’s usage object) and computes the USD cost. Overwrites any estimates written by record_input_messages() / record_output_message(). Returns the computed USD cost as a float, or None if the model is not found in the pricing database.

prompt_tokens (int, required) — tokens consumed by the prompt.
completion_tokens (int, required) — tokens generated by the model.
source (str, default "litellm") — pricing source label stored in span.attributes["pricing_source"].

cost = llm.record_usage(
    prompt_tokens=response.usage.prompt_tokens,
    completion_tokens=response.usage.completion_tokens,
)

UUID | None

The UUID of the underlying Span record. None when the SDK is in no-op mode.

model

str

The model identifier passed to model_call(). Empty string in no-op mode.

Full usage example

import northstar

@northstar.trace("answer-agent")
def run_agent(query: str) -> str:
    messages = [
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": query},
    ]

    with northstar.model_call("answer-llm", model="gpt-4o") as llm:
        llm.record_input_messages(messages)

        response = openai_client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
        )

        llm.record_output_message(response.choices[0].message.model_dump())
        cost = llm.record_usage(
            prompt_tokens=response.usage.prompt_tokens,
            completion_tokens=response.usage.completion_tokens,
        )

    return response.choices[0].message.content

Multi-turn example

with northstar.model_call("step-1", model="gpt-4o-mini") as llm:
    llm.record_input_messages(first_turn_messages)
    r1 = client.chat.completions.create(model="gpt-4o-mini", messages=first_turn_messages)
    llm.record_output_message(r1.choices[0].message.model_dump())
    llm.record_usage(
        prompt_tokens=r1.usage.prompt_tokens,
        completion_tokens=r1.usage.completion_tokens,
    )

# second turn — a new model_call span
with northstar.model_call("step-2", model="gpt-4o") as llm:
    llm.record_input_messages(second_turn_messages)
    r2 = client.chat.completions.create(model="gpt-4o", messages=second_turn_messages)
    llm.record_output_message(r2.choices[0].message.model_dump())
    llm.record_usage(
        prompt_tokens=r2.usage.prompt_tokens,
        completion_tokens=r2.usage.completion_tokens,
    )

No-op behavior

When there is no active trace, or when the SDK is disabled, model_call() yields a _NoopModelSpan. All methods on _NoopModelSpan accept the same arguments and silently return 0 (for token counts) or None (for cost), so your code path does not need to branch.

Cost aggregation

After the run finishes, NorthStar automatically aggregates cost_usd, input_tokens, and output_tokens across all MODEL-kind spans and writes totals to the run’s metadata (cost_usd, total_input_tokens, total_output_tokens). These totals appear in the dashboard’s run summary view.

Relationship to `auto_instrument()`

northstar.auto_instrument() uses model_call() internally. When auto-instrumentation is active for OpenAI or Anthropic, every provider call is automatically wrapped in a model_call() span — you do not need to call it manually. Use model_call() directly when you use a provider not yet covered by auto-instrumentation, or when you need fine-grained control over what is recorded.

Install the pricing extra (uv add 'northstar-ai[pricing]') to enable token counting and USD cost computation. Without it, record_usage() still stores the raw token counts but returns None for cost, and record_input_messages() / record_output_message() return 0.

Core API

Data Models

LLM Service

Evals API

northstar.model_call() — Record LLM Calls and Token Usage

Function signature

Parameters

`ModelSpan` methods

Full usage example

Multi-turn example

No-op behavior

Cost aggregation

Relationship to `auto_instrument()`

Build docs developers (and LLMs) love

Core API

Data Models

LLM Service

Evals API

Documentation Index

​Function signature

​Parameters

​ModelSpan methods

​Full usage example

​Multi-turn example

​No-op behavior

​Cost aggregation

​Relationship to auto_instrument()

Build docs developers (and LLMs) love

Function signature

Parameters

`ModelSpan` methods

Full usage example

Multi-turn example

No-op behavior

Cost aggregation

Relationship to `auto_instrument()`