Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/earendil-works/pi/llms.txt

Use this file to discover all available pages before exploring further.

pi-ai uses a registry of API implementations. A provider offers models through a specific API — for example, Anthropic uses anthropic-messages, while OpenAI uses openai-responses. Many third-party providers reuse the openai-completions API (OpenAI Chat Completions format).

Built-in APIs

API identifierDescriptionExports
anthropic-messagesAnthropic Messages APIstreamAnthropic, AnthropicOptions
google-generative-aiGoogle Generative AI APIstreamGoogle, GoogleOptions
google-vertexGoogle Vertex AI APIstreamGoogleVertex, GoogleVertexOptions
mistral-conversationsMistral Conversations APIstreamMistral, MistralOptions
openai-completionsOpenAI Chat Completions API (widely compatible)streamOpenAICompletions, OpenAICompletionsOptions
openai-responsesOpenAI Responses APIstreamOpenAIResponses, OpenAIResponsesOptions
openai-codex-responsesOpenAI Codex Responses APIstreamOpenAICodexResponses, OpenAICodexResponsesOptions
azure-openai-responsesAzure OpenAI Responses APIstreamAzureOpenAIResponses, AzureOpenAIResponsesOptions
bedrock-converse-streamAmazon Bedrock Converse APIstreamBedrock, BedrockOptions

Provider to API mapping

ProviderAPI
anthropicanthropic-messages
googlegoogle-generative-ai
google-vertexgoogle-vertex
openaiopenai-responses
openai-codexopenai-codex-responses
azure-openai-responsesazure-openai-responses
mistralmistral-conversations
amazon-bedrockbedrock-converse-stream
fireworksanthropic-messages (Anthropic-compatible)
kimi-codinganthropic-messages (Anthropic-compatible)
xiaomianthropic-messages (Anthropic-compatible)
xai, groq, cerebras, deepseek, openrouter, vercel-ai-gateway, minimax, cloudflare-workers-ai, cloudflare-ai-gateway, github-copilot, opencode, opencode-goopenai-completions

Custom models

Define a Model<TApi> directly for local servers, proxied endpoints, or any OpenAI-compatible API:
import { Model, stream } from '@earendil-works/pi-ai';

const ollamaModel: Model<'openai-completions'> = {
  id: 'llama-3.1-8b',
  name: 'Llama 3.1 8B (Ollama)',
  api: 'openai-completions',
  provider: 'ollama',
  baseUrl: 'http://localhost:11434/v1',
  reasoning: false,
  input: ['text'],
  cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
  contextWindow: 128000,
  maxTokens: 32000
};

// Ollama doesn't require a real key
const response = await stream(ollamaModel, context, { apiKey: 'dummy' });

OpenAI compatibility settings

Many OpenAI-compatible servers differ in which features they support. The compat field on a Model<'openai-completions'> lets you override the library’s URL-based auto-detection:
interface OpenAICompletionsCompat {
  supportsStore?: boolean;           // Whether provider supports the `store` field (default: true)
  supportsDeveloperRole?: boolean;   // Whether provider supports `developer` role vs `system` (default: true)
  supportsReasoningEffort?: boolean; // Whether provider supports `reasoning_effort` (default: true)
  supportsUsageInStreaming?: boolean; // Whether provider supports `stream_options: { include_usage: true }` (default: true)
  supportsStrictMode?: boolean;      // Whether provider supports `strict` in tool definitions (default: true)
  sendSessionAffinityHeaders?: boolean; // Whether to send session affinity headers from sessionId (default: false)
  maxTokensField?: 'max_completion_tokens' | 'max_tokens'; // Which field name to use (default: max_completion_tokens)
  requiresToolResultName?: boolean;  // Whether tool results require the `name` field (default: false)
  requiresAssistantAfterToolResult?: boolean; // Whether tool results must be followed by an assistant message (default: false)
  requiresThinkingAsText?: boolean;  // Whether thinking blocks must be converted to text (default: false)
  requiresReasoningContentOnAssistantMessages?: boolean; // Auto-detected for DeepSeek
  thinkingFormat?: 'openai' | 'deepseek' | 'zai' | 'qwen' | 'qwen-chat-template';
  cacheControlFormat?: 'anthropic';  // Anthropic-style cache_control on key messages
  openRouterRouting?: OpenRouterRouting;
  vercelGatewayRouting?: VercelGatewayRouting;
}
If compat is not set, the library falls back to URL-based detection. If compat is partially set, unspecified fields use the detected defaults. Common cases:
  • LiteLLM proxies: Set supportsStore: false
  • Ollama / vLLM / SGLang: Set supportsDeveloperRole: false, supportsReasoningEffort: false
  • Custom inference servers: May use maxTokensField: 'max_tokens' or non-standard features
For openai-responses models, the compat field only supports Responses-specific flags (sendSessionIdHeader, supportsLongCacheRetention).

Environment variables

Set these in Node.js to avoid passing apiKey explicitly on every call:
ProviderEnvironment variable(s)
OpenAIOPENAI_API_KEY
Azure OpenAIAZURE_OPENAI_API_KEY + AZURE_OPENAI_BASE_URL or AZURE_OPENAI_RESOURCE_NAME; optional: AZURE_OPENAI_API_VERSION, AZURE_OPENAI_DEPLOYMENT_NAME_MAP
AnthropicANTHROPIC_API_KEY or ANTHROPIC_OAUTH_TOKEN
DeepSeekDEEPSEEK_API_KEY
GoogleGEMINI_API_KEY
Vertex AIGOOGLE_CLOUD_API_KEY or GOOGLE_CLOUD_PROJECT + GOOGLE_CLOUD_LOCATION + ADC
MistralMISTRAL_API_KEY
GroqGROQ_API_KEY
CerebrasCEREBRAS_API_KEY
Cloudflare AI GatewayCLOUDFLARE_API_KEY + CLOUDFLARE_ACCOUNT_ID + CLOUDFLARE_GATEWAY_ID
Cloudflare Workers AICLOUDFLARE_API_KEY + CLOUDFLARE_ACCOUNT_ID
xAIXAI_API_KEY
FireworksFIREWORKS_API_KEY
OpenRouterOPENROUTER_API_KEY
Vercel AI GatewayAI_GATEWAY_API_KEY
zAIZAI_API_KEY
MiniMaxMINIMAX_API_KEY
OpenCode Zen / OpenCode GoOPENCODE_API_KEY
Kimi For CodingKIMI_API_KEY
Xiaomi MiMo (API billing)XIAOMI_API_KEY
Xiaomi MiMo Token Plan (China)XIAOMI_TOKEN_PLAN_CN_API_KEY
Xiaomi MiMo Token Plan (Amsterdam)XIAOMI_TOKEN_PLAN_AMS_API_KEY
Xiaomi MiMo Token Plan (Singapore)XIAOMI_TOKEN_PLAN_SGP_API_KEY
GitHub CopilotCOPILOT_GITHUB_TOKEN or GH_TOKEN or GITHUB_TOKEN
import { getEnvApiKey } from '@earendil-works/pi-ai';

// Check if an API key is configured for a provider
const key = getEnvApiKey('openai');  // checks OPENAI_API_KEY

Faux provider for testing

registerFauxProvider() registers an in-memory provider for deterministic tests and demos. It is opt-in and not part of the built-in provider set.
import {
  complete,
  fauxAssistantMessage,
  fauxText,
  fauxThinking,
  fauxToolCall,
  registerFauxProvider,
  stream,
} from '@earendil-works/pi-ai';

const registration = registerFauxProvider({ tokensPerSecond: 50 });
const model = registration.getModel();

const context = {
  messages: [{ role: 'user', content: 'Summarize package.json and then call echo', timestamp: Date.now() }]
};

// Queue scripted responses
registration.setResponses([
  fauxAssistantMessage([
    fauxThinking('Need to inspect package metadata first.'),
    fauxToolCall('echo', { text: 'package.json' })
  ], { stopReason: 'toolUse' })
]);

const first = await complete(model, context, {
  sessionId: 'session-1',
  cacheRetention: 'short'
});
context.messages.push(first);

context.messages.push({
  role: 'toolResult',
  toolCallId: first.content.find((block) => block.type === 'toolCall')!.id,
  toolName: 'echo',
  content: [{ type: 'text', text: 'package.json contents here' }],
  isError: false,
  timestamp: Date.now()
});

registration.setResponses([
  fauxAssistantMessage([
    fauxThinking('Now I can summarize the tool output.'),
    fauxText('Here is the summary.')
  ])
]);

const s = stream(model, context);
for await (const event of s) {
  console.log(event.type);
}

// Multiple faux models for model-switching tests
const multiModel = registerFauxProvider({
  models: [
    { id: 'faux-fast', reasoning: false },
    { id: 'faux-thinker', reasoning: true }
  ]
});
const thinker = multiModel.getModel('faux-thinker');

// Inspect state and clean up
console.log(registration.getPendingResponseCount());
console.log(registration.state.callCount);
registration.unregister();
multiModel.unregister();
  • Responses are consumed from a queue in request start order
  • If the queue is empty, the provider returns an error message: "No more faux responses queued"
  • Use setResponses([...]) to replace the queue; appendResponses([...]) to add more
  • registration.models exposes all registered faux models; getModel() returns the first one, getModel(id) a specific one
  • Usage is estimated at roughly 1 token per 4 characters
  • When sessionId is present and cacheRetention is not "none", prompt cache reads and writes are simulated
  • Tool call arguments stream incrementally via toolcall_delta chunks
  • By default, each streamed chunk is emitted on its own microtask; set tokensPerSecond to pace delivery in real time
  • Use one registration per deterministic scripted flow; register separate faux providers for independent concurrent flows

Build docs developers (and LLMs) love