pi-ai: unified multi-provider LLM API

@earendil-works/pi-ai provides a unified streaming API across 20+ LLM providers. It handles model discovery, provider configuration, token and cost tracking, and context persistence — including seamless handoffs between models mid-conversation.

Only models that support tool calling (function calling) are included in the registry, as tool use is essential for agentic workflows.

Installation

npm install @earendil-works/pi-ai

TypeBox exports (Type, Static, TSchema) are re-exported from @earendil-works/pi-ai for convenience.

Quick start

import { Type, getModel, stream, complete, Context, Tool, StringEnum } from '@earendil-works/pi-ai';

// Fully typed with auto-complete support for both providers and models
const model = getModel('openai', 'gpt-4o-mini');

// Define tools with TypeBox schemas for type safety and validation
const tools: Tool[] = [{
  name: 'get_time',
  description: 'Get the current time',
  parameters: Type.Object({
    timezone: Type.Optional(Type.String({ description: 'Optional timezone (e.g., America/New_York)' }))
  })
}];

// Build a conversation context (easily serializable and transferable between models)
const context: Context = {
  systemPrompt: 'You are a helpful assistant.',
  messages: [{ role: 'user', content: 'What time is it?' }],
  tools
};

// Option 1: Streaming with all event types
const s = stream(model, context);

for await (const event of s) {
  switch (event.type) {
    case 'start':
      console.log(`Starting with ${event.partial.model}`);
      break;
    case 'text_start':
      console.log('\n[Text started]');
      break;
    case 'text_delta':
      process.stdout.write(event.delta);
      break;
    case 'text_end':
      console.log('\n[Text ended]');
      break;
    case 'thinking_start':
      console.log('[Model is thinking...]');
      break;
    case 'thinking_delta':
      process.stdout.write(event.delta);
      break;
    case 'thinking_end':
      console.log('[Thinking complete]');
      break;
    case 'toolcall_start':
      console.log(`\n[Tool call started: index ${event.contentIndex}]`);
      break;
    case 'toolcall_delta':
      // Partial tool arguments are being streamed
      const partialCall = event.partial.content[event.contentIndex];
      if (partialCall.type === 'toolCall') {
        console.log(`[Streaming args for ${partialCall.name}]`);
      }
      break;
    case 'toolcall_end':
      console.log(`\nTool called: ${event.toolCall.name}`);
      console.log(`Arguments: ${JSON.stringify(event.toolCall.arguments)}`);
      break;
    case 'done':
      console.log(`\nFinished: ${event.reason}`);
      break;
    case 'error':
      console.error(`Error: ${event.error}`);
      break;
  }
}

// Get the final message after streaming, add it to the context
const finalMessage = await s.result();
context.messages.push(finalMessage);

// Handle tool calls if any
const toolCalls = finalMessage.content.filter(b => b.type === 'toolCall');
for (const call of toolCalls) {
  const result = call.name === 'get_time'
    ? new Date().toLocaleString('en-US', {
        timeZone: call.arguments.timezone || 'UTC',
        dateStyle: 'full',
        timeStyle: 'long'
      })
    : 'Unknown tool';

  context.messages.push({
    role: 'toolResult',
    toolCallId: call.id,
    toolName: call.name,
    content: [{ type: 'text', text: result }],
    isError: false,
    timestamp: Date.now()
  });
}

// Continue if there were tool calls
if (toolCalls.length > 0) {
  const continuation = await complete(model, context);
  context.messages.push(continuation);
  console.log('After tool execution:', continuation.content);
}

console.log(`Total tokens: ${finalMessage.usage.input} in, ${finalMessage.usage.output} out`);
console.log(`Cost: $${finalMessage.usage.cost.total.toFixed(4)}`);

// Option 2: Get complete response without streaming
const response = await complete(model, context);

for (const block of response.content) {
  if (block.type === 'text') {
    console.log(block.text);
  } else if (block.type === 'toolCall') {
    console.log(`Tool: ${block.name}(${JSON.stringify(block.arguments)})`);
  }
}

Supported providers

OpenAI

openai — GPT-4o, GPT-5, and more via the Responses API

Anthropic

anthropic — Claude 3.x/4.x via the Messages API

Google

google — Gemini via the Generative AI API

Vertex AI

google-vertex — Gemini via Google Cloud Vertex AI

Azure OpenAI

azure-openai-responses — OpenAI models via Azure Responses API

OpenAI Codex

openai-codex — GPT-5.x Codex models (ChatGPT Plus/Pro, OAuth required)

Mistral

mistral — Mistral models via the Conversations API

Groq

groq — Fast inference via OpenAI-compatible API

Cerebras

cerebras — Wafer-scale inference via OpenAI-compatible API

xAI

xai — Grok models via OpenAI-compatible API

DeepSeek

deepseek — DeepSeek models via OpenAI-compatible API

OpenRouter

openrouter — Unified gateway to many providers

Vercel AI Gateway

vercel-ai-gateway — Vercel-hosted gateway

Cloudflare AI Gateway

cloudflare-ai-gateway — Cloudflare-hosted AI Gateway

Cloudflare Workers AI

cloudflare-workers-ai — Cloudflare’s own inference

Amazon Bedrock

amazon-bedrock — AWS Bedrock Converse API

GitHub Copilot

github-copilot — Copilot subscription models (OAuth required)

MiniMax

minimax — MiniMax models

Fireworks

fireworks — Fireworks inference (Anthropic-compatible API)

Kimi For Coding

kimi-coding — Moonshot AI (Anthropic-compatible API)

Xiaomi MiMo

xiaomi — MiMo models (Anthropic-compatible; separate Token Plan providers for cn/ams/sgp)

OpenAI-compatible

Custom openai-completions — Ollama, vLLM, LM Studio, LiteLLM, etc.

Querying providers and models

import { getProviders, getModels, getModel } from '@earendil-works/pi-ai';

// Get all available providers
const providers = getProviders();
console.log(providers); // ['openai', 'anthropic', 'google', 'xai', 'groq', ...]

// Get all models from a provider (fully typed)
const anthropicModels = getModels('anthropic');
for (const model of anthropicModels) {
  console.log(`${model.id}: ${model.name}`);
  console.log(`  API: ${model.api}`);           // 'anthropic-messages'
  console.log(`  Context: ${model.contextWindow} tokens`);
  console.log(`  Vision: ${model.input.includes('image')}`);
  console.log(`  Reasoning: ${model.reasoning}`);
}

// Get a specific model (provider and model ID are auto-completed in IDEs)
const model = getModel('openai', 'gpt-4o-mini');
console.log(`Using ${model.name} via ${model.api} API`);

Custom models

Use Model<'openai-completions'> to define a custom model for any OpenAI-compatible endpoint. The api field selects which API implementation the library uses.

import { Model, stream } from '@earendil-works/pi-ai';

// Ollama using OpenAI-compatible API
const ollamaModel: Model<'openai-completions'> = {
  id: 'llama-3.1-8b',
  name: 'Llama 3.1 8B (Ollama)',
  api: 'openai-completions',
  provider: 'ollama',
  baseUrl: 'http://localhost:11434/v1',
  reasoning: false,
  input: ['text'],
  cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
  contextWindow: 128000,
  maxTokens: 32000
};

// LiteLLM proxy with explicit compat settings
const litellmModel: Model<'openai-completions'> = {
  id: 'gpt-4o',
  name: 'GPT-4o (via LiteLLM)',
  api: 'openai-completions',
  provider: 'litellm',
  baseUrl: 'http://localhost:4000/v1',
  reasoning: false,
  input: ['text', 'image'],
  cost: { input: 2.5, output: 10, cacheRead: 0, cacheWrite: 0 },
  contextWindow: 128000,
  maxTokens: 16384,
  compat: {
    supportsStore: false  // LiteLLM doesn't support the store field
  }
};

// Custom proxied Anthropic endpoint
const proxyModel: Model<'anthropic-messages'> = {
  id: 'claude-sonnet-4',
  name: 'Claude Sonnet 4 (Proxied)',
  api: 'anthropic-messages',
  provider: 'custom-proxy',
  baseUrl: 'https://proxy.example.com/v1',
  reasoning: true,
  input: ['text', 'image'],
  cost: { input: 3, output: 15, cacheRead: 0.3, cacheWrite: 3.75 },
  contextWindow: 200000,
  maxTokens: 8192,
  headers: {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
    'X-Custom-Auth': 'bearer-token-here'
  }
};

// Use the custom model (Ollama doesn't require a real key)
const response = await stream(ollamaModel, context, { apiKey: 'dummy' });

Context serialization

The Context object is fully serializable with JSON.stringify and JSON.parse, making it easy to persist conversations, implement chat history, or hand off a session to a different model.

import { Context, getModel, complete } from '@earendil-works/pi-ai';

const context: Context = {
  systemPrompt: 'You are a helpful assistant.',
  messages: [{ role: 'user', content: 'What is TypeScript?' }]
};

const model = getModel('openai', 'gpt-4o-mini');
const response = await complete(model, context);
context.messages.push(response);

// Serialize and save
const serialized = JSON.stringify(context);
localStorage.setItem('conversation', serialized);

// Later: restore and continue with any model
const restored: Context = JSON.parse(localStorage.getItem('conversation')!);
restored.messages.push({ role: 'user', content: 'Tell me more about its type system' });

const newModel = getModel('anthropic', 'claude-3-5-haiku-20241022');
const continuation = await complete(newModel, restored);

If the context contains images (base64-encoded), they are included in the serialized output. Be mindful of storage size.

Cross-provider handoffs

pi-ai supports switching providers mid-conversation. When messages from one provider reach another:

User and tool result messages pass through unchanged
Assistant messages from the same provider/API are preserved as-is
Assistant messages from a different provider have thinking blocks converted to <thinking>-tagged text
Tool calls and regular text are always preserved

See Streaming events for details on the events emitted during streaming, including thinking events.

Browser usage

The library works in browsers, but you must pass the API key explicitly — environment variables are not available. Exposing API keys in frontend code is dangerous. Only do this for internal tools or demos; use a backend proxy for production.

import { getModel, complete } from '@earendil-works/pi-ai';

const model = getModel('anthropic', 'claude-3-5-haiku-20241022');
const response = await complete(model, {
  messages: [{ role: 'user', content: 'Hello!' }]
}, {
  apiKey: 'your-api-key'
});

Additional browser limitations:

Amazon Bedrock (bedrock-converse-stream) is not supported in browser environments
OAuth login flows require Node.js — use @earendil-works/pi-ai/oauth server-side only
Use a server-side proxy for Bedrock or OAuth-based authentication from a web app

pi-ai

pi-agent-core

pi-ai: unified multi-provider LLM API

Installation

Quick start

Supported providers

OpenAI

Anthropic

Google

Vertex AI

Azure OpenAI

OpenAI Codex

Mistral

Groq

Cerebras

xAI

DeepSeek

OpenRouter

Vercel AI Gateway

Cloudflare AI Gateway

Cloudflare Workers AI

Amazon Bedrock

GitHub Copilot

MiniMax

Fireworks

Kimi For Coding

Xiaomi MiMo

OpenAI-compatible

Querying providers and models

Custom models

Context serialization

Cross-provider handoffs

Browser usage

Build docs developers (and LLMs) love

pi-ai

pi-agent-core

Documentation Index

​Installation

​Quick start

​Supported providers

OpenAI

Anthropic

Google

Vertex AI

Azure OpenAI

OpenAI Codex

Mistral

Groq

Cerebras

xAI

DeepSeek

OpenRouter

Vercel AI Gateway

Cloudflare AI Gateway

Cloudflare Workers AI

Amazon Bedrock

GitHub Copilot

MiniMax

Fireworks

Kimi For Coding

Xiaomi MiMo

OpenAI-compatible

​Querying providers and models

​Custom models

​Context serialization

​Cross-provider handoffs

​Browser usage

Build docs developers (and LLMs) love

Installation

Quick start

Supported providers

Querying providers and models

Custom models

Context serialization

Cross-provider handoffs

Browser usage