Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Conway-Research/automaton/llms.txt

Use this file to discover all available pages before exploring further.

Conway Inference provides a unified API for accessing frontier language models from multiple providers. All inference costs are billed from your Conway credits, eliminating the need for separate API keys and billing accounts.

Quick start

const response = await inference.chat([
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Explain quantum entanglement in one sentence." }
]);

console.log(response.message.content);
console.log(`Tokens used: ${response.usage.totalTokens}`);

Available models

List all models with current pricing:
const models = await conway.listModels();

for (const model of models) {
  console.log(`${model.id} (${model.provider})`);
  console.log(`  Input: $${model.pricing.inputPerMillion}/M tokens`);
  console.log(`  Output: $${model.pricing.outputPerMillion}/M tokens`);
}

Example model catalog

ModelProviderInput $/MOutput $/MUse case
gpt-5.2openai2.5010.00Most capable, best reasoning
gpt-5-miniopenai0.301.20Fast, cost-effective
claude-opus-4.6anthropic15.0075.00Longest context, best writing
claude-sonnet-4.5anthropic3.0015.00Balanced performance
gemini-3-flashgoogle0.100.40Fastest, cheapest
kimi-k2.5moonshot0.502.00200K context, Chinese support
The model registry is automatically refreshed every 6 hours by the heartbeat daemon. Pricing and availability are subject to change.

Model selection

Setting default model

// Via config file
{
  "inferenceModel": "gpt-5.2",
  "lowComputeModel": "gpt-5-mini"
}

Switching models

Change the active model at runtime:
// Using the switch_model tool (persists to config)
await tools.switch_model({ model: "claude-sonnet-4.5" });

// Or specify per-request
const response = await inference.chat(messages, {
  model: "gpt-5-mini",
  maxTokens: 2048
});

Automatic model selection

The inference router automatically switches models based on survival tier:
TierModel selection
high/normalConfigured default (e.g., gpt-5.2)
low_computeConfigured fallback (e.g., gpt-5-mini)
criticalCheapest available model
// Router selects appropriate model based on credits
const tier = getSurvivalTier(creditsCents);
const model = router.selectModel(tier, context);

Inference backends

Conway Inference supports multiple backends:

1. Conway proxy (default)

Routes through Conway’s inference endpoint, billed from credits:
const client = createInferenceClient({
  apiUrl: "https://api.conway.tech",
  apiKey: conwayApiKey,
  defaultModel: "gpt-5.2",
  maxTokens: 4096
});

2. OpenAI direct

Use your own OpenAI API key:
const client = createInferenceClient({
  apiUrl: "https://api.conway.tech",
  apiKey: conwayApiKey,
  defaultModel: "gpt-5.2",
  maxTokens: 4096,
  openaiApiKey: process.env.OPENAI_API_KEY
});

// OpenAI models automatically route to api.openai.com
await client.chat(messages, { model: "gpt-5.2" });

3. Anthropic direct

Use your own Anthropic API key:
const client = createInferenceClient({
  apiUrl: "https://api.conway.tech",
  apiKey: conwayApiKey,
  defaultModel: "claude-opus-4.6",
  maxTokens: 4096,
  anthropicApiKey: process.env.ANTHROPIC_API_KEY
});

// Claude models automatically route to api.anthropic.com
await client.chat(messages, { model: "claude-opus-4.6" });

4. Ollama local

Run models locally with Ollama:
const client = createInferenceClient({
  apiUrl: "https://api.conway.tech",
  apiKey: conwayApiKey,
  defaultModel: "llama3.1",
  maxTokens: 4096,
  ollamaBaseUrl: "http://localhost:11434"
});

// Ollama models route to local endpoint
await client.chat(messages, { model: "llama3.1" });

Backend routing logic

The client automatically routes requests based on model name and available API keys:
function resolveInferenceBackend(model: string): InferenceBackend {
  // 1. Check model registry for explicit provider
  const provider = getModelProvider(model);
  if (provider === "ollama" && ollamaBaseUrl) return "ollama";
  if (provider === "anthropic" && anthropicApiKey) return "anthropic";
  if (provider === "openai" && openaiApiKey) return "openai";
  if (provider === "conway") return "conway";

  // 2. Fall back to heuristics if model not in registry
  if (anthropicApiKey && /^claude/i.test(model)) return "anthropic";
  if (openaiApiKey && /^(gpt-|o[1-9])/i.test(model)) return "openai";

  // 3. Default to Conway proxy
  return "conway";
}

Chat completion

Basic request

const response = await inference.chat([
  { role: "system", content: "You are a trading bot." },
  { role: "user", content: "Should I buy or sell today?" }
]);

console.log(response.message.content);

With options

const response = await inference.chat(messages, {
  model: "gpt-5-mini",
  maxTokens: 2048,
  temperature: 0.7,
  tools: [
    {
      type: "function",
      function: {
        name: "get_stock_price",
        description: "Get current stock price",
        parameters: {
          type: "object",
          properties: {
            symbol: { type: "string", description: "Ticker symbol" }
          },
          required: ["symbol"]
        }
      }
    }
  ]
});

Response format

interface InferenceResponse {
  id: string;                    // Request ID
  model: string;                 // Model that handled the request
  message: {
    role: "assistant";           // Always "assistant"
    content: string;             // Text response
    tool_calls?: InferenceToolCall[];
  };
  toolCalls?: InferenceToolCall[];
  usage: {
    promptTokens: number;        // Input tokens
    completionTokens: number;    // Output tokens
    totalTokens: number;         // Total tokens
  };
  finishReason: string;          // "stop", "tool_calls", "length"
}

Tool calling

Models can call tools to gather information:
const response = await inference.chat(messages, {
  tools: [
    {
      type: "function",
      function: {
        name: "get_weather",
        description: "Get weather for a location",
        parameters: {
          type: "object",
          properties: {
            location: { type: "string" },
            units: { type: "string", enum: ["celsius", "fahrenheit"] }
          },
          required: ["location"]
        }
      }
    }
  ]
});

if (response.toolCalls) {
  for (const call of response.toolCalls) {
    console.log(`Tool: ${call.function.name}`);
    console.log(`Args: ${call.function.arguments}`);
    
    // Execute tool and append result
    const result = await executeWeatherTool(
      JSON.parse(call.function.arguments)
    );
    
    messages.push({
      role: "assistant",
      content: "",
      tool_calls: [call]
    });
    messages.push({
      role: "tool",
      tool_call_id: call.id,
      content: JSON.stringify(result)
    });
  }
  
  // Continue conversation with tool results
  const followUp = await inference.chat(messages);
}

Token limits

Model-specific limits

Different models use different token limit parameters:
// GPT-4 and older: max_tokens
{ model: "gpt-4", max_tokens: 4096 }

// GPT-4.1+, GPT-5+, o-series: max_completion_tokens
{ model: "gpt-5.2", max_completion_tokens: 4096 }

// Ollama: always max_tokens
{ model: "llama3.1", max_tokens: 4096 }
The client automatically selects the correct parameter:
const usesCompletionTokens = /^(o[1-9]|gpt-5|gpt-4\.1)/.test(model);
const tokenLimit = opts?.maxTokens || maxTokens;

if (usesCompletionTokens) {
  body.max_completion_tokens = tokenLimit;
} else {
  body.max_tokens = tokenLimit;
}

Cost tracking

Every inference call is logged to the database:
// Check spending
const stats = await tools.check_inference_spending({
  days: 7
});

console.log(`Total: $${stats.totalCents / 100}`);
console.log(`Calls: ${stats.callCount}`);
console.log(`Avg per call: $${stats.avgCentsPerCall / 100}`);

Daily spending limits

Automatons enforce a maximum daily inference budget:
{
  "maxInferenceDailyCents": 50000  // $500/day
}
When the limit is exceeded:
  1. Inference calls are blocked
  2. The automaton enters survival mode
  3. Heartbeat publishes a spending alert

Anthropic-specific handling

Claude models require special message formatting:

System messages

Extracted to separate system parameter:
// Input
[
  { role: "system", content: "You are helpful." },
  { role: "user", content: "Hello" }
]

// Transformed for Anthropic
{
  system: "You are helpful.",
  messages: [
    { role: "user", content: "Hello" }
  ]
}

Tool results

Converted to tool_result content blocks:
// Input
{ role: "tool", tool_call_id: "call_123", content: "42" }

// Transformed for Anthropic
{
  role: "user",
  content: [
    {
      type: "tool_result",
      tool_use_id: "call_123",
      content: "42"
    }
  ]
}

Message merging

Consecutive messages with the same role are merged:
// Input
[
  { role: "user", content: "A" },
  { role: "user", content: "B" }
]

// Merged
[
  { role: "user", content: "A\nB" }
]
This ensures alternating user/assistant structure required by Anthropic.

Low compute mode

setLowComputeMode() is deprecated. Use InferenceRouter for tier-based model selection.
Legacy method for switching to cheaper models:
// Switch to low-compute model
inference.setLowComputeMode(true);

// Reverts to default model
inference.setLowComputeMode(false);
Modern approach:
const tier = getSurvivalTier(creditsCents);
const model = router.selectModel(tier, context);
const response = await inference.chat(messages, { model });

Error handling

Timeout errors

Inference requests timeout after 60 seconds:
try {
  const response = await inference.chat(messages);
} catch (err) {
  if (err.message.includes("timeout")) {
    // Model took too long to respond
    // Retry with shorter max_tokens or simpler prompt
  }
}

Rate limits

try {
  const response = await inference.chat(messages);
} catch (err) {
  if (err.message.includes("429")) {
    // Rate limit exceeded
    // Client automatically retries with exponential backoff
  }
}

Insufficient credits

try {
  const response = await inference.chat(messages);
} catch (err) {
  if (err.message.includes("Insufficient credits")) {
    // Buy more credits or wait for funding
    await enterSurvivalMode();
  }
}

Best practices

Choose the right model

  • High stakes, complex reasoning: gpt-5.2, claude-opus-4.6
  • Routine tasks: gpt-5-mini, claude-sonnet-4.5
  • Rapid prototyping: gemini-3-flash, gpt-5-mini
  • Long context: claude-opus-4.6 (200K), kimi-k2.5 (200K)

Optimize token usage

// ❌ Wasteful: includes entire codebase
const response = await inference.chat([
  { role: "user", content: entireCodebase + "\n\nFind the bug." }
]);

// ✅ Efficient: semantic search for relevant context
const relevant = await semanticSearch(query, limit: 10);
const response = await inference.chat([
  { role: "user", content: relevant + "\n\nFind the bug." }
]);

Monitor spending

// Check daily spending before expensive operations
const stats = await tools.check_inference_spending({ days: 1 });
if (stats.totalCents > maxInferenceDailyCents * 0.9) {
  // Approaching limit, switch to cheaper model
  await tools.switch_model({ model: "gpt-5-mini" });
}

Troubleshooting

Empty response

if (!response.message.content && !response.toolCalls) {
  throw new Error("No completion content returned");
}
Causes:
  • Model hit token limit (increase maxTokens)
  • Content filtered by safety system (rephrase prompt)
  • Model chose to only call tools (check toolCalls)

Invalid tool calls

try {
  const args = JSON.parse(call.function.arguments);
} catch (err) {
  // Model returned malformed JSON
  // Add validation in tool schema or retry
}

Model not found

// Model ID typo or not in registry
const models = await conway.listModels();
const available = models.map(m => m.id);
console.log("Available:", available);

Next steps

Survival system

Learn how automatons adapt to low credits

Tools system

Understand how models call tools

Build docs developers (and LLMs) love