Retries

The AI Gateway retries failed requests automatically, using exponential backoff to space out attempts and avoid overwhelming rate-limited providers. Retries apply to the current target before any fallback is triggered.

Configuration

{
  "retry": {
    "attempts": 5,
    "on_status_codes": [429, 500, 502, 503, 504]
  }
}

retry.attempts

number

required

Maximum number of retry attempts. The gateway supports up to 5 retries per target. Must be a positive integer.

retry.on_status_codes

array

HTTP status codes that trigger a retry. Defaults to [429, 500, 502, 503, 504]. Errors with codes not in this list cause the request to fail immediately without retrying.

Default retry behavior

The default retryable status codes are:

Status code	Meaning
`429`	Too Many Requests (rate limited)
`500`	Internal Server Error
`502`	Bad Gateway
`503`	Service Unavailable
`504`	Gateway Timeout

from portkey_ai import Portkey

client = Portkey(
    provider="openai",
    Authorization="sk-...",
    config={
      "retry": {
        "attempts": 5,
        "on_status_codes": [429, 500, 502, 503, 504]
      }
    }
)

client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}]
)

Exponential backoff

The gateway uses an exponential backoff strategy between retry attempts to prevent network overload. Retry intervals increase with each failed attempt. The total retry window is capped at 60 seconds (MAX_RETRY_LIMIT_MS). If a provider’s retry-after header specifies a wait time longer than the remaining window, the retry is skipped and the error is returned immediately.

Provider-supplied retry delays

When a 429 response includes a retry-after, retry-after-ms, or x-ms-retry-after-ms header, the gateway honors that delay before retrying:

// Recognized retry-after headers (from globals.ts)
const POSSIBLE_RETRY_STATUS_HEADERS = [
  'retry-after-ms',
  'x-ms-retry-after-ms',
  'retry-after',
];

If the provider-specified delay exceeds the 60-second window, the retry is skipped.

Retries and timeouts

If a request_timeout is set on the target, each individual attempt is subject to that timeout. A timed-out request returns a 408 status and — if 408 is in on_status_codes — is retried.

{
  "retry": {"attempts": 3, "on_status_codes": [408, 429, 503]},
  "targets": [{
    "provider": "openai",
    "request_timeout": 10000
  }]
}

Retries with fallbacks

Retries and fallbacks are applied in sequence:

The gateway sends the request to the first target.
If it fails with a retryable status code, it retries (up to attempts times).
Only after all retry attempts are exhausted does the gateway move to the next fallback target.

{
  "strategy": {"mode": "fallback"},
  "retry": {"attempts": 2},
  "targets": [
    {"provider": "openai", "override_params": {"model": "gpt-4o"}},
    {"provider": "anthropic", "override_params": {"model": "claude-3-5-sonnet-20241022"}}
  ]
}

With this config, the gateway makes up to 3 total requests to OpenAI (1 + 2 retries), then falls back to Anthropic if all fail.

Streaming responses are not retried. If a streaming request fails mid-stream, the error is returned to the caller immediately.

Response headers

The gateway includes retry metadata in the response headers:

Header	Description
`x-portkey-retry-attempt-count`	Number of retry attempts made for the final response

Get Started

Deployment

Core Concepts

Guardrails

MCP Gateway

Integrations

Plugin Development

Configuration

Default retry behavior

Exponential backoff

Provider-supplied retry delays

Retries and timeouts

Retries with fallbacks

Response headers

Build docs developers (and LLMs) love

Get Started

Deployment

Core Concepts

Guardrails

MCP Gateway

Integrations

Plugin Development

​Configuration

​Default retry behavior

​Exponential backoff

​Provider-supplied retry delays

​Retries and timeouts

​Retries with fallbacks

​Response headers

Build docs developers (and LLMs) love

Configuration

Default retry behavior

Exponential backoff

Provider-supplied retry delays

Retries and timeouts

Retries with fallbacks

Response headers