Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ndycode/codex-multi-auth/llms.txt
Use this file to discover all available pages before exploring further.
Request Transformation Pipeline
Codex Multi-Auth transforms every OpenAI SDK request through a 7-step pipeline before sending to the Codex API.
Pipeline Overview
OpenAI SDK Call (generateText/streamText)
|
v
┌────────────────────────────────────────────────────────────┐
│ Step 1: URL Rewriting │
│ /v1/responses → /v1/realtime/responses │
└────────────────────────────────────────────────────────────┘
|
v
┌────────────────────────────────────────────────────────────┐
│ Step 2: Account Selection │
│ - Filter cooldowns & rate limits │
│ - Apply session affinity │
│ - Score by health + quota │
│ - Select best account │
└────────────────────────────────────────────────────────────┘
|
v
┌────────────────────────────────────────────────────────────┐
│ Step 3: Model Normalization │
│ gpt-5.3-codex → gpt-5-codex │
│ openai/gpt-5-codex → gpt-5-codex │
└────────────────────────────────────────────────────────────┘
|
v
┌────────────────────────────────────────────────────────────┐
│ Step 4: Body Transformation │
│ - Inject model-family instructions │
│ - Set store: false, stream: true │
│ - Configure reasoning & text verbosity │
│ - Add reasoning.encrypted_content to include │
│ - Filter orphaned tool outputs │
│ - Apply fast-session optimizations │
└────────────────────────────────────────────────────────────┘
|
v
┌────────────────────────────────────────────────────────────┐
│ Step 5: Header Injection │
│ - Authorization: Bearer <access_token> │
│ - openai-account-id: <account_id> │
│ - openai-beta: realtime-responses-2024-11-19 │
│ - openai-originator: codex_cli_rs │
│ - openai-conversation-id: <prompt_cache_key> │
└────────────────────────────────────────────────────────────┘
|
v
┌────────────────────────────────────────────────────────────┐
│ Step 6: Execute Request │
│ - Set fetch timeout (default 2 minutes) │
│ - Enable stream stall detection (default 45s) │
│ - Apply circuit breaker │
└────────────────────────────────────────────────────────────┘
|
v
┌────────────────────────────────────────────────────────────┐
│ Step 7: Response Handling │
│ - SSE → JSON for generateText (non-streaming) │
│ - Pass-through SSE for streamText │
│ - Extract rate limit info from headers │
│ - Update account state (cooldowns, rate limits) │
└────────────────────────────────────────────────────────────┘
|
v
Response to SDK
Step 1: URL Rewriting
From lib/request/fetch-helpers.ts:381:
export function rewriteUrlForCodex(url: string): string {
const parsedUrl = new URL(url);
// Rewrite /v1/responses to /v1/realtime/responses
const rewrittenPath = parsedUrl.pathname.includes("/v1/responses")
? parsedUrl.pathname.replace("/v1/responses", "/v1/realtime/responses")
: parsedUrl.pathname;
// Ensure base path prefix
const normalizedPath = rewrittenPath.startsWith("/v1/realtime/")
? rewrittenPath
: `/v1/realtime${rewrittenPath}`;
// Update to Codex base URL
parsedUrl.protocol = "https:";
parsedUrl.host = "api.openai.com";
parsedUrl.pathname = normalizedPath;
return parsedUrl.toString();
}
Example:
Input: https://api.openai.com/v1/responses
Output: https://api.openai.com/v1/realtime/responses
Step 2: Account Selection
From lib/accounts.ts and index.ts:1372:
// 1. Filter accounts
const now = Date.now();
const available = accounts.filter((account, index) => {
// Skip if in cooldown
if (account.cooldownUntil && account.cooldownUntil > now) {
return false;
}
// Skip if rate limited for this model family
const resetTime = getRateLimitResetTimeForFamily(account, now, modelFamily);
if (resetTime && resetTime > now) {
return false;
}
// Skip if circuit breaker is open
const breaker = getCircuitBreaker(`account:${index}`);
if (breaker.getState() === "open") {
return false;
}
return true;
});
// 2. Apply session affinity
const preferredIndex = sessionAffinityStore.getPreferredAccountIndex(threadId);
if (preferredIndex !== undefined && available.includes(accounts[preferredIndex])) {
// Prefer same account for this conversation
selectedIndex = preferredIndex;
} else {
// 3. Score by health + quota + capability + preemptive quota
const scores = available.map((account, index) => {
let score = account.healthScore ?? 100;
// Boost for capability policy
score += capabilityPolicyStore.getAccountScore(index, model) * 0.1;
// Reduce for preemptive quota deferral
const deferralMs = preemptiveQuotaScheduler.shouldDeferRequest(index, modelFamily);
if (deferralMs > 0) {
score -= 50;
}
// PID offset for fair rotation
if (pidOffsetEnabled) {
score += (index * 0.001);
}
return { index, score };
});
// 4. Select highest score
scores.sort((a, b) => b.score - a.score);
selectedIndex = scores[0]?.index ?? 0;
}
Selection Factors:
- Cooldown status: Skip accounts with active cooldown
- Rate limit status: Skip accounts with rate limits for this model family
- Circuit breaker: Skip accounts with open circuit breaker
- Session affinity: Prefer same account for same conversation
- Health score: 0-100, decrements on failure, resets on success
- Capability score: Boost for accounts that support this model
- Quota deferral: Reduce score if quota is low
- PID offset: Tiny offset for deterministic fair rotation
Step 3: Model Normalization
From lib/request/request-transformer.ts:40:
export function normalizeModel(model: string | undefined): string {
if (!model) return "gpt-5.1";
// Strip provider prefix (openai/gpt-5-codex → gpt-5-codex)
const modelId = model.includes("/") ? model.split("/").pop() ?? model : model;
// Explicit model map (handles known variants)
const mappedModel = getNormalizedModel(modelId);
if (mappedModel) return mappedModel;
// Pattern-based fallback
const normalized = modelId.toLowerCase();
// Legacy aliases
if (normalized.includes("gpt-5.3-codex-spark")) return "gpt-5-codex";
if (normalized.includes("gpt-5.3-codex")) return "gpt-5-codex";
if (normalized.includes("gpt-5.2-codex")) return "gpt-5-codex";
if (normalized.includes("gpt-5.1-codex")) return "gpt-5-codex";
// Canonical Codex models
if (normalized.includes("gpt-5-codex")) return "gpt-5-codex";
if (normalized.includes("gpt-5.1-codex-max")) return "gpt-5.1-codex-max";
if (normalized.includes("gpt-5.1-codex-mini")) return "gpt-5.1-codex-mini";
// GPT-5 variants
if (normalized.includes("gpt-5.2")) return "gpt-5.2";
if (normalized.includes("gpt-5.1")) return "gpt-5.1";
if (normalized.includes("gpt-5")) return "gpt-5.1";
return "gpt-5.1"; // Default fallback
}
Normalization Examples:
openai/gpt-5-codex → gpt-5-codex
gpt-5.3-codex-spark → gpt-5-codex
gpt-5.2-codex → gpt-5-codex
gpt-5.1-codex → gpt-5-codex
gpt-5-codex-low → gpt-5-codex (variant stripped for API)
gpt-5.1-codex-max → gpt-5.1-codex-max
gpt-5.1-codex-mini → gpt-5.1-codex-mini
gpt-5.2 → gpt-5.2
gpt-5.1 → gpt-5.1
Step 4: Body Transformation
From lib/request/request-transformer.ts:821:
export async function transformRequestBody(
body: RequestBody,
codexInstructions: string,
userConfig: UserConfig = { global: {}, models: {} },
codexMode = true,
fastSession = false,
fastSessionStrategy: FastSessionStrategy = "hybrid",
fastSessionMaxInputItems = 30,
): Promise<RequestBody> {
const originalModel = body.model;
const normalizedModel = normalizeModel(body.model);
const modelConfig = getModelConfig(originalModel || normalizedModel, userConfig);
// Set normalized model
body.model = normalizedModel;
// Codex required fields
body.store = false; // Stateless (required by ChatGPT backend)
body.stream = true; // Always stream (SSE)
// Inject Codex instructions
body.instructions = shouldApplyFastSessionTuning
? compactInstructionsForFastSession(codexInstructions, isTrivialTurn)
: codexInstructions;
// Filter input array
if (body.input) {
// Apply fast-session input trimming
if (fastSession) {
body.input = trimInputForFastSession(
body.input,
fastSessionMaxInputItems,
{ preferLatestUserOnly: isTrivialTurn }
);
}
// Remove item_reference (AI SDK construct, not supported by Codex)
// Strip IDs from all items (stateless mode)
body.input = filterInput(body.input);
// Add bridge/tool-remap message
if (codexMode) {
body.input = await filterHostSystemPrompts(body.input);
body.input = addCodexBridgeMessage(body.input, !!body.tools);
} else {
body.input = addToolRemapMessage(body.input, !!body.tools);
}
// Handle orphaned tool outputs
body.input = normalizeOrphanedToolOutputs(body.input);
body.input = injectMissingToolOutputs(body.input);
}
// Configure reasoning
const reasoningConfig = resolveReasoningConfig(normalizedModel, modelConfig, body);
body.reasoning = {
...body.reasoning,
...reasoningConfig,
};
// Fast-session overrides
if (fastSession && shouldApplyFastSessionTuning) {
body.reasoning.effort = "none"; // or "low" for Codex models
body.reasoning.summary = "auto";
body.text.verbosity = "low";
}
// Configure text verbosity
body.text = {
...body.text,
verbosity: resolveTextVerbosity(modelConfig, body),
};
// Add include for encrypted reasoning content
body.include = resolveInclude(modelConfig, body);
// Always includes "reasoning.encrypted_content" for stateless continuity
// Remove unsupported parameters
body.max_output_tokens = undefined;
body.max_completion_tokens = undefined;
return body;
}
1. Stateless Mode (store: false)
From ARCHITECTURE.md:73:
ChatGPT backend requires store: false, include reasoning.encrypted_content.
Why stateless?
- Codex API doesn’t persist conversation state
- Requires full context in each request
reasoning.encrypted_content maintains reasoning continuity
From lib/request/request-transformer.ts:542:
export function filterInput(
input: InputItem[] | undefined,
): InputItem[] | undefined {
if (!Array.isArray(input)) return input;
return input
.filter((item) => {
// Remove AI SDK constructs not supported by Codex API
if (item.type === "item_reference") {
return false; // AI SDK only - references server state
}
return true; // Keep all other items
})
.map((item) => {
// Strip IDs from all items (Codex API stateless mode)
if (item.id) {
const { id: _omit, ...itemWithoutId } = item;
return itemWithoutId as InputItem;
}
return item;
});
}
Why remove item_reference?
- AI SDK uses this for server-side state lookup
- Not supported by Codex API (stateless)
- Would cause API errors
Why strip IDs?
- Stateless mode doesn’t track item IDs
- Reduces payload size
- Prevents ID conflicts
From lib/request/helpers/input-utils.ts:180:
export function normalizeOrphanedToolOutputs(
input: InputItem[],
): InputItem[] {
// Problem: function_call_output references a function_call that was
// an item_reference (now filtered out). API rejects orphaned outputs.
// Solution: Convert orphaned outputs to assistant messages to preserve
// context without API errors.
const functionCallIds = new Set<string>();
for (const item of input) {
if (item.type === "function_call" && item.call_id) {
functionCallIds.add(item.call_id);
}
}
return input.map((item) => {
if (item.type === "function_call_output") {
const callId = item.call_id;
if (callId && !functionCallIds.has(callId)) {
// Orphaned output - convert to message
return {
type: "message",
role: "assistant",
content: [
{
type: "input_text",
text: `[Previous tool result: ${JSON.stringify(item.output)}]`,
},
],
} as InputItem;
}
}
return item;
});
}
Why this matters:
- Prevents infinite loops (LLM loses tool results)
- Preserves conversation context
- Avoids API validation errors
4. Reasoning Configuration
From lib/request/request-transformer.ts:388:
export function getReasoningConfig(
modelName: string | undefined,
userConfig: ConfigOptions = {},
): ReasoningConfig {
const normalizedName = modelName?.toLowerCase() ?? "";
// Canonical GPT-5 Codex defaults to high, does not support "none"
const isGpt5Codex = normalizedName.includes("gpt-5-codex");
const isGpt52General = normalizedName.includes("gpt-5.2") && !isCodex;
const isGpt51General = normalizedName.includes("gpt-5.1") && !isCodex;
// GPT-5.2 general and GPT-5.1 general support "none" reasoning
const supportsNone = isGpt52General || isGpt51General;
// GPT-5.2 general supports xhigh
const supportsXhigh = isGpt52General;
// Default effort
const defaultEffort = isGpt5Codex ? "high" : "medium";
// Get user-requested effort
let effort = userConfig.reasoningEffort || defaultEffort;
// Downgrade unsupported values
if (!supportsXhigh && effort === "xhigh") {
effort = "high";
}
if (!supportsNone && effort === "none") {
effort = "low";
}
const summary = userConfig.reasoningSummary ?? "auto";
return { effort, summary };
}
Model-Specific Defaults:
gpt-5-codex: effort: high, supports: low, medium, high
gpt-5.1-codex-max: effort: high, supports: low, medium, high, xhigh
gpt-5.1-codex-mini: effort: medium, supports: medium, high
gpt-5.2: effort: high, supports: none, low, medium, high, xhigh
gpt-5.1: effort: medium, supports: none, low, medium, high
5. Fast-Session Optimizations
From lib/request/request-transformer.ts:569:
export function trimInputForFastSession(
input: InputItem[] | undefined,
maxItems: number,
options?: { preferLatestUserOnly?: boolean },
): InputItem[] | undefined {
if (!Array.isArray(input)) return input;
const safeMax = Math.max(8, Math.floor(maxItems)); // Default 30
// Strategy 1: Trivial turns (short, simple questions)
if (options?.preferLatestUserOnly && isTrivialLatestPrompt(latestUserText)) {
// Keep only: minimal system prompt + latest user message
return [firstSystemPrompt, latestUserMessage];
}
// Strategy 2: Complex requests (code blocks, lists, tables)
// Keep: up to 2 leading system/developer messages + last N items
const keepIndexes = new Set<number>();
// Keep small leading system prompts (< 1200 chars)
for (let i = 0; i < 2; i++) {
const item = input[i];
if (item?.role === "developer" || item?.role === "system") {
const text = extractMessageText(item.content);
if (text.length <= 1200) {
keepIndexes.add(i);
}
}
}
// Keep last N items
for (let i = Math.max(0, input.length - safeMax); i < input.length; i++) {
keepIndexes.add(i);
}
return input.filter((_, index) => keepIndexes.has(index));
}
Fast-Session Benefits:
- Lower latency: Smaller context = faster processing
- Lower cost: Less tokens to process
- Better UX: Instant responses for simple questions
When applied:
const shouldApplyFastSessionTuning =
fastSession &&
(fastSessionStrategy === "always" ||
!isComplexFastSessionRequest(body, maxInputItems));
From lib/request/fetch-helpers.ts:505:
export function createCodexHeaders(
init: RequestInit | undefined,
accountId: string,
accessToken: string,
opts?: { model?: string; promptCacheKey?: string },
): Headers {
const headers = new Headers(init?.headers ?? {});
// Remove any existing API key
headers.delete("x-api-key");
// OAuth authentication
headers.set("Authorization", `Bearer ${accessToken}`);
// Account ID (org-* or user-*)
headers.set("openai-account-id", accountId);
// Realtime responses beta flag
headers.set("openai-beta", "realtime-responses-2024-11-19");
// Originator tag (identifies Codex CLI)
headers.set("openai-originator", "codex_cli_rs");
// Prompt caching (session affinity)
const cacheKey = opts?.promptCacheKey;
if (cacheKey) {
headers.set("openai-conversation-id", cacheKey);
headers.set("openai-session-id", cacheKey);
} else {
headers.delete("openai-conversation-id");
headers.delete("openai-session-id");
}
// Accept SSE
headers.set("accept", "text/event-stream");
return headers;
}
Key Headers:
Authorization: OAuth bearer token
openai-account-id: Account/org ID (affects quotas)
openai-beta: Enable realtime responses API
openai-originator: Identifies plugin as Codex CLI
openai-conversation-id: Prompt cache key (optional)
openai-session-id: Session identifier (optional)
Step 6: Execute Request
From index.ts:1550:
const controller = new AbortController();
const fetchTimeoutMs = getFetchTimeoutMs(pluginConfig); // Default 120s
const timeoutId = setTimeout(() => {
controller.abort();
}, fetchTimeoutMs);
try {
// Execute with circuit breaker
const breaker = getCircuitBreaker(`account:${selectedIndex}`);
if (!breaker.canExecute()) {
throw new CircuitOpenError();
}
const response = await fetch(url, {
...requestInit,
headers: codexHeaders,
signal: controller.signal,
});
clearTimeout(timeoutId);
// Record circuit breaker success
breaker.recordSuccess();
return response;
} catch (error) {
clearTimeout(timeoutId);
// Record circuit breaker failure
breaker.recordFailure();
throw error;
}
Timeout Behavior:
- Default: 120 seconds (2 minutes)
- Configurable via
CODEX_AUTH_FETCH_TIMEOUT_MS
- Aborts request on timeout
- Triggers failover to next account
Step 7: Response Handling
From lib/request/fetch-helpers.ts:589:
export async function handleSuccessResponse(
response: Response,
isStreaming: boolean,
options?: { streamStallTimeoutMs?: number },
): Promise<Response> {
// Check for deprecation headers (RFC 8594)
const deprecation = response.headers.get("Deprecation");
const sunset = response.headers.get("Sunset");
if (deprecation || sunset) {
logWarn(`API deprecation notice`, { deprecation, sunset });
}
const responseHeaders = ensureContentType(response.headers);
// For non-streaming requests (generateText), convert SSE to JSON
if (!isStreaming) {
return await convertSseToJson(response, responseHeaders, options);
}
// For streaming requests (streamText), return stream as-is
return new Response(response.body, {
status: response.status,
statusText: response.statusText,
headers: responseHeaders,
});
}
SSE to JSON Conversion
From lib/request/response-handler.ts:47:
export async function convertSseToJson(
response: Response,
headers: Headers,
options?: { streamStallTimeoutMs?: number },
): Promise<Response> {
if (!response.body) {
return new Response("{}", { status: 200, headers });
}
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
let finalData: unknown = null;
try {
while (true) {
const { done, value } = await readWithTimeout(
reader,
options?.streamStallTimeoutMs ?? 45_000,
);
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop() ?? "";
for (const line of lines) {
if (line.startsWith("data: ")) {
const data = line.slice(6);
if (data === "[DONE]") continue;
try {
const parsed = JSON.parse(data);
if (parsed.type === "response.done") {
finalData = parsed.response;
break;
}
} catch {
continue;
}
}
}
if (finalData) break;
}
} finally {
reader.releaseLock();
}
const json = finalData ?? {};
return new Response(JSON.stringify(json), {
status: 200,
headers,
});
}
SSE Event Format:
data: {"type":"response.started","response":{"id":"resp_abc123"}}
data: {"type":"response.output_text.delta","delta":"Hello"}
data: {"type":"response.output_text.delta","delta":" world"}
data: {"type":"response.done","response":{"id":"resp_abc123","output":[{"type":"text","text":"Hello world"}]}}
data: [DONE]
Conversion Result:
{
"id": "resp_abc123",
"output": [
{
"type": "text",
"text": "Hello world"
}
]
}
1. Model-Family Instruction Caching
From lib/prompts/codex.ts:180:
const instructionsCache = new Map<ModelFamily, string>();
const etagCache = new Map<ModelFamily, string>();
export async function getCodexInstructions(
model: string,
): Promise<string> {
const family = getModelFamily(model);
// Return cached instructions if available
if (instructionsCache.has(family)) {
return instructionsCache.get(family)!;
}
// Fetch from GitHub with ETag caching
const url = CODEX_INSTRUCTIONS_URLS[family];
const etag = etagCache.get(family);
const response = await fetch(url, {
headers: etag ? { "If-None-Match": etag } : {},
});
if (response.status === 304) {
// Not modified, use cached version
return instructionsCache.get(family)!;
}
const instructions = await response.text();
// Update cache
instructionsCache.set(family, instructions);
if (response.headers.has("ETag")) {
etagCache.set(family, response.headers.get("ETag")!);
}
return instructions;
}
Benefits:
- Reduces GitHub API calls (ETag caching)
- Faster request transformation (in-memory cache)
- Survives across multiple requests
2. Prewarming
From index.ts:1145:
if (!startupPrewarmTriggered && prewarmEnabled) {
startupPrewarmTriggered = true;
const configuredModels = Object.keys(userConfig.models ?? {});
prewarmCodexInstructions(configuredModels);
if (codexMode) {
prewarmHostCodexPrompt();
}
}
Prewarming triggers:
- On plugin load (background fetch)
- Fetches instructions for all configured models
- No request-time latency for first use