LLM providers, models, and APIs in pi-ai

pi-ai uses a registry of API implementations. A provider offers models through a specific API — for example, Anthropic uses anthropic-messages, while OpenAI uses openai-responses. Many third-party providers reuse the openai-completions API (OpenAI Chat Completions format).

Built-in APIs

API identifier	Description	Exports
`anthropic-messages`	Anthropic Messages API	`streamAnthropic`, `AnthropicOptions`
`google-generative-ai`	Google Generative AI API	`streamGoogle`, `GoogleOptions`
`google-vertex`	Google Vertex AI API	`streamGoogleVertex`, `GoogleVertexOptions`
`mistral-conversations`	Mistral Conversations API	`streamMistral`, `MistralOptions`
`openai-completions`	OpenAI Chat Completions API (widely compatible)	`streamOpenAICompletions`, `OpenAICompletionsOptions`
`openai-responses`	OpenAI Responses API	`streamOpenAIResponses`, `OpenAIResponsesOptions`
`openai-codex-responses`	OpenAI Codex Responses API	`streamOpenAICodexResponses`, `OpenAICodexResponsesOptions`
`azure-openai-responses`	Azure OpenAI Responses API	`streamAzureOpenAIResponses`, `AzureOpenAIResponsesOptions`
`bedrock-converse-stream`	Amazon Bedrock Converse API	`streamBedrock`, `BedrockOptions`

Provider to API mapping

Provider	API
`anthropic`	`anthropic-messages`
`google`	`google-generative-ai`
`google-vertex`	`google-vertex`
`openai`	`openai-responses`
`openai-codex`	`openai-codex-responses`
`azure-openai-responses`	`azure-openai-responses`
`mistral`	`mistral-conversations`
`amazon-bedrock`	`bedrock-converse-stream`
`fireworks`	`anthropic-messages` (Anthropic-compatible)
`kimi-coding`	`anthropic-messages` (Anthropic-compatible)
`xiaomi`	`anthropic-messages` (Anthropic-compatible)
`xai`, `groq`, `cerebras`, `deepseek`, `openrouter`, `vercel-ai-gateway`, `minimax`, `cloudflare-workers-ai`, `cloudflare-ai-gateway`, `github-copilot`, `opencode`, `opencode-go`	`openai-completions`

Custom models

Define a Model<TApi> directly for local servers, proxied endpoints, or any OpenAI-compatible API:

Ollama
LiteLLM
Proxied endpoint
Ollama reasoning model

import { Model, stream } from '@earendil-works/pi-ai';

const ollamaModel: Model<'openai-completions'> = {
  id: 'llama-3.1-8b',
  name: 'Llama 3.1 8B (Ollama)',
  api: 'openai-completions',
  provider: 'ollama',
  baseUrl: 'http://localhost:11434/v1',
  reasoning: false,
  input: ['text'],
  cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
  contextWindow: 128000,
  maxTokens: 32000
};

// Ollama doesn't require a real key
const response = await stream(ollamaModel, context, { apiKey: 'dummy' });

import { Model } from '@earendil-works/pi-ai';

const litellmModel: Model<'openai-completions'> = {
  id: 'gpt-4o',
  name: 'GPT-4o (via LiteLLM)',
  api: 'openai-completions',
  provider: 'litellm',
  baseUrl: 'http://localhost:4000/v1',
  reasoning: false,
  input: ['text', 'image'],
  cost: { input: 2.5, output: 10, cacheRead: 0, cacheWrite: 0 },
  contextWindow: 128000,
  maxTokens: 16384,
  compat: {
    supportsStore: false  // LiteLLM doesn't support the store field
  }
};

import { Model } from '@earendil-works/pi-ai';

const proxyModel: Model<'anthropic-messages'> = {
  id: 'claude-sonnet-4',
  name: 'Claude Sonnet 4 (Proxied)',
  api: 'anthropic-messages',
  provider: 'custom-proxy',
  baseUrl: 'https://proxy.example.com/v1',
  reasoning: true,
  input: ['text', 'image'],
  cost: { input: 3, output: 15, cacheRead: 0.3, cacheWrite: 3.75 },
  contextWindow: 200000,
  maxTokens: 8192,
  headers: {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
    'X-Custom-Auth': 'bearer-token-here'
  }
};

import { Model } from '@earendil-works/pi-ai';

const ollamaReasoningModel: Model<'openai-completions'> = {
  id: 'gpt-oss:20b',
  name: 'GPT-OSS 20B (Ollama)',
  api: 'openai-completions',
  provider: 'ollama',
  baseUrl: 'http://localhost:11434/v1',
  reasoning: true,
  input: ['text'],
  cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
  contextWindow: 131072,
  maxTokens: 32000,
  thinkingLevelMap: {
    minimal: null,
    low: null,
    medium: null,
    high: 'high',
    xhigh: null,
  },
  compat: {
    supportsDeveloperRole: false,
    supportsReasoningEffort: false,
  }
};

OpenAI compatibility settings

Many OpenAI-compatible servers differ in which features they support. The compat field on a Model<'openai-completions'> lets you override the library’s URL-based auto-detection:

interface OpenAICompletionsCompat {
  supportsStore?: boolean;           // Whether provider supports the `store` field (default: true)
  supportsDeveloperRole?: boolean;   // Whether provider supports `developer` role vs `system` (default: true)
  supportsReasoningEffort?: boolean; // Whether provider supports `reasoning_effort` (default: true)
  supportsUsageInStreaming?: boolean; // Whether provider supports `stream_options: { include_usage: true }` (default: true)
  supportsStrictMode?: boolean;      // Whether provider supports `strict` in tool definitions (default: true)
  sendSessionAffinityHeaders?: boolean; // Whether to send session affinity headers from sessionId (default: false)
  maxTokensField?: 'max_completion_tokens' | 'max_tokens'; // Which field name to use (default: max_completion_tokens)
  requiresToolResultName?: boolean;  // Whether tool results require the `name` field (default: false)
  requiresAssistantAfterToolResult?: boolean; // Whether tool results must be followed by an assistant message (default: false)
  requiresThinkingAsText?: boolean;  // Whether thinking blocks must be converted to text (default: false)
  requiresReasoningContentOnAssistantMessages?: boolean; // Auto-detected for DeepSeek
  thinkingFormat?: 'openai' | 'deepseek' | 'zai' | 'qwen' | 'qwen-chat-template';
  cacheControlFormat?: 'anthropic';  // Anthropic-style cache_control on key messages
  openRouterRouting?: OpenRouterRouting;
  vercelGatewayRouting?: VercelGatewayRouting;
}

If compat is not set, the library falls back to URL-based detection. If compat is partially set, unspecified fields use the detected defaults. Common cases:

LiteLLM proxies: Set supportsStore: false
Ollama / vLLM / SGLang: Set supportsDeveloperRole: false, supportsReasoningEffort: false
Custom inference servers: May use maxTokensField: 'max_tokens' or non-standard features

For openai-responses models, the compat field only supports Responses-specific flags (sendSessionIdHeader, supportsLongCacheRetention).

Environment variables

Set these in Node.js to avoid passing apiKey explicitly on every call:

Provider	Environment variable(s)
OpenAI	`OPENAI_API_KEY`
Azure OpenAI	`AZURE_OPENAI_API_KEY` + `AZURE_OPENAI_BASE_URL` or `AZURE_OPENAI_RESOURCE_NAME`; optional: `AZURE_OPENAI_API_VERSION`, `AZURE_OPENAI_DEPLOYMENT_NAME_MAP`
Anthropic	`ANTHROPIC_API_KEY` or `ANTHROPIC_OAUTH_TOKEN`
DeepSeek	`DEEPSEEK_API_KEY`
Google	`GEMINI_API_KEY`
Vertex AI	`GOOGLE_CLOUD_API_KEY` or `GOOGLE_CLOUD_PROJECT` + `GOOGLE_CLOUD_LOCATION` + ADC
Mistral	`MISTRAL_API_KEY`
Groq	`GROQ_API_KEY`
Cerebras	`CEREBRAS_API_KEY`
Cloudflare AI Gateway	`CLOUDFLARE_API_KEY` + `CLOUDFLARE_ACCOUNT_ID` + `CLOUDFLARE_GATEWAY_ID`
Cloudflare Workers AI	`CLOUDFLARE_API_KEY` + `CLOUDFLARE_ACCOUNT_ID`
xAI	`XAI_API_KEY`
Fireworks	`FIREWORKS_API_KEY`
OpenRouter	`OPENROUTER_API_KEY`
Vercel AI Gateway	`AI_GATEWAY_API_KEY`
zAI	`ZAI_API_KEY`
MiniMax	`MINIMAX_API_KEY`
OpenCode Zen / OpenCode Go	`OPENCODE_API_KEY`
Kimi For Coding	`KIMI_API_KEY`
Xiaomi MiMo (API billing)	`XIAOMI_API_KEY`
Xiaomi MiMo Token Plan (China)	`XIAOMI_TOKEN_PLAN_CN_API_KEY`
Xiaomi MiMo Token Plan (Amsterdam)	`XIAOMI_TOKEN_PLAN_AMS_API_KEY`
Xiaomi MiMo Token Plan (Singapore)	`XIAOMI_TOKEN_PLAN_SGP_API_KEY`
GitHub Copilot	`COPILOT_GITHUB_TOKEN` or `GH_TOKEN` or `GITHUB_TOKEN`

import { getEnvApiKey } from '@earendil-works/pi-ai';

// Check if an API key is configured for a provider
const key = getEnvApiKey('openai');  // checks OPENAI_API_KEY

Faux provider for testing

registerFauxProvider() registers an in-memory provider for deterministic tests and demos. It is opt-in and not part of the built-in provider set.

import {
  complete,
  fauxAssistantMessage,
  fauxText,
  fauxThinking,
  fauxToolCall,
  registerFauxProvider,
  stream,
} from '@earendil-works/pi-ai';

const registration = registerFauxProvider({ tokensPerSecond: 50 });
const model = registration.getModel();

const context = {
  messages: [{ role: 'user', content: 'Summarize package.json and then call echo', timestamp: Date.now() }]
};

// Queue scripted responses
registration.setResponses([
  fauxAssistantMessage([
    fauxThinking('Need to inspect package metadata first.'),
    fauxToolCall('echo', { text: 'package.json' })
  ], { stopReason: 'toolUse' })
]);

const first = await complete(model, context, {
  sessionId: 'session-1',
  cacheRetention: 'short'
});
context.messages.push(first);

context.messages.push({
  role: 'toolResult',
  toolCallId: first.content.find((block) => block.type === 'toolCall')!.id,
  toolName: 'echo',
  content: [{ type: 'text', text: 'package.json contents here' }],
  isError: false,
  timestamp: Date.now()
});

registration.setResponses([
  fauxAssistantMessage([
    fauxThinking('Now I can summarize the tool output.'),
    fauxText('Here is the summary.')
  ])
]);

const s = stream(model, context);
for await (const event of s) {
  console.log(event.type);
}

// Multiple faux models for model-switching tests
const multiModel = registerFauxProvider({
  models: [
    { id: 'faux-fast', reasoning: false },
    { id: 'faux-thinker', reasoning: true }
  ]
});
const thinker = multiModel.getModel('faux-thinker');

// Inspect state and clean up
console.log(registration.getPendingResponseCount());
console.log(registration.state.callCount);
registration.unregister();
multiModel.unregister();

Faux provider notes

Responses are consumed from a queue in request start order
If the queue is empty, the provider returns an error message: "No more faux responses queued"
Use setResponses([...]) to replace the queue; appendResponses([...]) to add more
registration.models exposes all registered faux models; getModel() returns the first one, getModel(id) a specific one
Usage is estimated at roughly 1 token per 4 characters
When sessionId is present and cacheRetention is not "none", prompt cache reads and writes are simulated
Tool call arguments stream incrementally via toolcall_delta chunks
By default, each streamed chunk is emitted on its own microtask; set tokensPerSecond to pace delivery in real time
Use one registration per deterministic scripted flow; register separate faux providers for independent concurrent flows

pi-ai

pi-agent-core

LLM providers, models, and APIs in pi-ai

Built-in APIs

Provider to API mapping

Custom models

OpenAI compatibility settings

Environment variables

Faux provider for testing

Build docs developers (and LLMs) love

pi-ai

pi-agent-core

Documentation Index

​Built-in APIs

​Provider to API mapping

​Custom models

​OpenAI compatibility settings

​Environment variables

​Faux provider for testing

Build docs developers (and LLMs) love

Built-in APIs

Provider to API mapping

Custom models

OpenAI compatibility settings

Environment variables

Faux provider for testing