Skip to main content
The AI Gateway can cache LLM responses and serve them without making a new provider request. This reduces both cost and latency for repeated or similar queries.

Cache modes

Simple caching performs exact-match lookups. The cache key is a SHA-256 hash of the serialized request body combined with the target URL. If an identical request has been made before, the cached response is returned immediately.
{
  "cache": {
    "mode": "simple",
    "max_age": 3600
  }
}
This mode is available in both the open-source gateway and the hosted Portkey service.

Configuration

cache.mode
string
required
Cache mode. "simple" for exact-match caching, "semantic" for embedding-based similarity matching.
cache.max_age
number
Cache TTL in seconds. After this duration, the cached entry expires and the next matching request hits the provider. Defaults to 24 hours (86400 seconds) if not set.

Usage

from portkey_ai import Portkey

client = Portkey(
    provider="openai",
    Authorization="sk-...",
    config={
      "cache": {
        "mode": "simple",
        "max_age": 3600
      }
    }
)

# First call — hits the provider
response1 = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What is 2 + 2?"}]
)

# Second identical call — served from cache
response2 = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What is 2 + 2?"}]
)

Cache status headers

Every response includes a x-portkey-cache-status header indicating whether the response came from cache:
ValueMeaning
HITResponse served from simple cache
SEMANTIC HITResponse served from semantic cache
MISSNo cache entry found; response from provider
SEMANTIC MISSSemantic search found no match; response from provider
REFRESHCache was bypassed due to force-refresh header
DISABLEDCaching is not enabled for this request

Force-refreshing the cache

To bypass the cache and fetch a fresh response from the provider, include the x-portkey-cache-force-refresh header:
curl http://localhost:8787/v1/chat/completions \
  -H "x-portkey-cache-force-refresh: true" \
  -H "x-portkey-config: <config>" \
  -d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello"}]}'

Persistent caching with Redis

By default, the gateway uses an in-memory cache that is lost when the process restarts. To persist the cache across restarts and share it across multiple gateway instances, configure a Redis connection:
REDIS_CONNECTION_STRING=redis://localhost:6379
Set this environment variable before starting the gateway. When Redis is available, all putInCache and getFromCache operations use Redis instead of the in-process store.
The in-memory cache does not persist across gateway restarts. Use Redis for production deployments where cache durability matters.

Caching limitations

  • Streaming responses are not cached. If stream: true is set in the request body, caching is skipped for that request.
  • Cache keys include the full request body and the provider URL. Any change to the request — model, messages, parameters — produces a different cache key.
  • The simple cache key is a SHA-256 hash of JSON.stringify(requestBody) + targetURL.

Build docs developers (and LLMs) love