Skip to main content
Permify uses several caching layers to deliver low-latency access control checks even under heavy distributed load. This page describes each layer, what it stores, how it is sized, and how it behaves during scale events.

Schema cache

Schema definitions are stored in an in-memory cache keyed by version string. When a request includes a schema_version in its metadata, Permify looks up that version in the in-memory cache first. If it is not found, Permify queries the database, stores the result in the cache, and serves it. If no schema_version is provided, Permify treats versions as alphanumeric, sorts them in that order, and fetches the head (latest) version — checking the cache first. Configure the schema cache size in your Permify configuration:
service:
  schema:
    cache:
      number_of_counters: 1_000
      max_cost: 10MiB
The cache backend is Ristretto.

Permission (data) cache

Permify applies MVCC (Multi-Version Concurrency Control) for Postgres. Every write and delete operation creates a new database snapshot, which both improves performance and produces a naturally consistent cache. The permission cache key encodes the tenant, schema version, snapshot token, and the full check request:
check_{tenant_id}_{schema_version}:{snapshot_token}:{check_request}
Permify hashes each incoming request and looks for a matching key. On a cache miss, it runs the check engine and writes the result to the cache under that key. Configure the permission cache:
service:
  permission:
    bulk_limit: 100
    concurrency_limit: 100
    cache:
      number_of_counters: 10_000
      max_cost: 10MiB
The cache backend is Ristretto.
The MVCC pattern also enables historical data storage. However, it accumulates old relationship rows over time. Permify includes a garbage collector that removes outdated data at a configurable interval.

Cache sizing and eviction

There is no separate dedicated cache for snap tokens. The snap token is part of the permission cache key, so the same permission.cache settings govern how many snap-token-keyed entries reside in memory:
Config keyPurpose
service.permission.cache.max_costMaximum memory budget (e.g. 10MiB, 256MiB). This is the effective size limit for all snap-token-keyed entries.
service.permission.cache.number_of_countersNumber of TinyLFU admission counters. A good rule of thumb is ~10× the expected number of unique cached items.
No TTL is configured by default. Eviction is driven purely by memory pressure against max_cost, using Ristretto’s TinyLFU admission policy combined with a SampledLFU eviction policy. Entries are evicted when new items need space and the budget is exhausted — not after a fixed time window.
If you observe high cache miss rates after a schema version change, this is expected behaviour. The schema_version component of the cache key changes, making all prior entries stale. Size your max_cost to hold a comfortable working set for the most recently active schema version.

Distributed cache

When you run multiple Permify instances, Permify activates consistent hashing across instances to make efficient use of their individual in-memory caches. Consistent hashing distributes cache keys across nodes independently of the total number of nodes. When a request arrives at any instance, the consistent hash ring determines which instance owns that key. Subsequent requests with the same hash are routed to the same instance, maximising cache hit rates and acting as a natural load balancer.

Single-instance behaviour

With one Permify instance, every API request stores its result in the local in-memory cache and serves future identical requests from there.

Multi-instance behaviour

With more than one instance, consistent hashing activates on API calls. Suppose a check result is stored on instance 2 — all subsequent requests with the same hash are routed to instance 2 regardless of which instance received the original call. Adding more instances automatically increases total cache capacity. Learn more: Introducing Consistent Hashing

Enabling distributed mode

distributed:
  enabled: true
  address: "kubernetes:///permify.default:5000"
  port: "5000"
Consistent hashing distributes keys evenly across cache nodes, but it is the application’s responsibility to ensure the cache is used effectively — reading from and writing to it appropriately.

Scaling events: adding or removing pods

When you scale out or scale in in Kubernetes, the following happens at the cache level: Key rebalancing is partial, not global. The consistent hash ring updates and only the key ranges that mapped to the affected pod need to move. The rest of the ring — and its cached entries — is undisturbed. Each pod’s cache is local and in-memory. Permify uses Ristretto as a process-local cache; there is no shared cache layer.
  • Scale-out (new pod joins): The new pod starts with a cold cache. Requests routed to it will miss and fall through to the database until the cache warms up. Expect a temporary increase in database load and latency after adding a pod.
  • Scale-in (pod removed): All entries cached in that pod are lost. The key range is reassigned to a remaining pod, which will see cold-cache behaviour for those keys until they warm up.
A brief hit-rate drop is normal during any scale event. Under typical read-heavy workloads, the warm-up period resolves within minutes depending on your max_cost budget and request rate. Permify also uses a circuit breaker pattern to detect and handle failures when the underlying database is unavailable, preventing unnecessary calls during outages and managing the reboot phase gracefully.

Build docs developers (and LLMs) love