Documentation Index
Fetch the complete documentation index at: https://mintlify.com/sidmanale643/northstar/llms.txt
Use this file to discover all available pages before exploring further.
LLMService is a thin, opinionated wrapper around LiteLLM that integrates NorthStar tracing into every completion call without any boilerplate. Every method creates a MODEL span, records input messages, captures the output message, and reports prompt and completion token counts along with the USD cost — all automatically. You only need to call llm.generate() the same way you would call litellm.completion().
Installation
LLMService depends on LiteLLM for token counting, cost lookups, and provider routing. Install the pricing extra:
LLMService requires northstar.init() (or a Northstar client) to be initialized before any method is called. The tracing span is created against the active NorthStar context; calling LLMService without an initialized client raises a RuntimeError.Constructor
The LiteLLM model string used when
model is not passed to a generation method. Accepts any model identifier that LiteLLM supports, including provider-prefixed strings like "openrouter/deepseek/deepseek-v4-flash" or "anthropic/claude-3-5-sonnet-20241022". Defaults to "gpt-4o-mini".Methods
generate() — Synchronous completion
Calls litellm.completion() synchronously and returns the full response object. A MODEL span is opened, input messages and the output message are recorded, and token usage is reported before the span closes.
The conversation history in OpenAI message format. Each entry must have a
"role" key ("system", "user", "assistant", or "tool") and a "content" key.Override the model for this call. Falls back to
default_model when None. Accepts any LiteLLM model string.Tool schemas in OpenAI function-calling format. When provided,
tool_choice is also forwarded to the provider. When None, tool calling is disabled.Forwarded directly to LiteLLM when
tools is provided. Ignored when tools is None. Defaults to "auto".Sampling temperature. Lower values produce more deterministic outputs. Defaults to
0.3.Additional keyword arguments passed directly to
litellm.completion(). Use this to pass max_tokens, top_p, stop, response_format, and any other provider-specific parameters.ModelResponse object (compatible with openai.ChatCompletion).
agenerate() — Async completion
Identical to generate() but calls litellm.acompletion() with await. Use inside async def functions.
The conversation history in OpenAI message format.
Override the model for this call. Falls back to
default_model when None.Tool schemas. When
None, tool calling is disabled.Forwarded to LiteLLM when
tools is provided. Defaults to "auto".Sampling temperature. Defaults to
0.3.Additional keyword arguments forwarded to
litellm.acompletion().ModelResponse object.
stream() — Synchronous streaming generator
Calls litellm.completion() with stream=True and yields each chunk as it arrives. Input messages are recorded before streaming begins. Token usage is captured from the final usage chunk (LiteLLM’s stream_options={"include_usage": True} is set automatically). The full aggregated content is recorded as the output message after the generator is exhausted.
The conversation history in OpenAI message format.
Override the model for this call. Falls back to
default_model when None.Tool schemas. When
None, tool calling is disabled.Forwarded to LiteLLM when
tools is provided. Defaults to "auto".Sampling temperature. Defaults to
0.3.Additional keyword arguments forwarded to
litellm.completion(). Note: stream_options is set automatically if not provided.choices[0].delta attribute.
astream() — Async streaming generator
Identical to stream() but uses litellm.acompletion() with stream=True and async for. Input messages are recorded before streaming, usage is captured from the final chunk, and the full content is recorded after the async generator is exhausted.
The conversation history in OpenAI message format.
Override the model for this call. Falls back to
default_model when None.Tool schemas. When
None, tool calling is disabled.Forwarded to LiteLLM when
tools is provided. Defaults to "auto".Sampling temperature. Defaults to
0.3.Additional keyword arguments forwarded to
litellm.acompletion(). stream_options is set automatically if not provided.Full usage example
Streaming example
Async streaming example
What gets recorded automatically
EveryLLMService method creates a MODEL span and records the following without any extra code:
| Recorded data | Source |
|---|---|
MODEL span named "llm.generate" / "llm.agenerate" / "llm.stream" / "llm.astream" | api.model_call() |
Input messages (per CaptureOptions) | span.record_input_messages() |
Output message (per CaptureOptions) | span.record_output_message() |
model, input_tokens, output_tokens, total_tokens | span.record_usage() |
cost_usd in USD | NorthStar pricing module via LiteLLM pricing tables |
Span status = ERROR + error dict | Automatic on any exception |