Cache modes
- Simple
- Semantic
Simple caching performs exact-match lookups. The cache key is a SHA-256 hash of the serialized request body combined with the target URL. If an identical request has been made before, the cached response is returned immediately.This mode is available in both the open-source gateway and the hosted Portkey service.
Configuration
Cache mode.
"simple" for exact-match caching, "semantic" for embedding-based similarity matching.Cache TTL in seconds. After this duration, the cached entry expires and the next matching request hits the provider. Defaults to 24 hours (86400 seconds) if not set.
Usage
Cache status headers
Every response includes ax-portkey-cache-status header indicating whether the response came from cache:
| Value | Meaning |
|---|---|
HIT | Response served from simple cache |
SEMANTIC HIT | Response served from semantic cache |
MISS | No cache entry found; response from provider |
SEMANTIC MISS | Semantic search found no match; response from provider |
REFRESH | Cache was bypassed due to force-refresh header |
DISABLED | Caching is not enabled for this request |
Force-refreshing the cache
To bypass the cache and fetch a fresh response from the provider, include thex-portkey-cache-force-refresh header:
Persistent caching with Redis
By default, the gateway uses an in-memory cache that is lost when the process restarts. To persist the cache across restarts and share it across multiple gateway instances, configure a Redis connection:putInCache and getFromCache operations use Redis instead of the in-process store.
The in-memory cache does not persist across gateway restarts. Use Redis for production deployments where cache durability matters.
Caching limitations
- Streaming responses are not cached. If
stream: trueis set in the request body, caching is skipped for that request. - Cache keys include the full request body and the provider URL. Any change to the request — model, messages, parameters — produces a different cache key.
- The
simplecache key is a SHA-256 hash ofJSON.stringify(requestBody) + targetURL.