Schema cache
Schema definitions are stored in an in-memory cache keyed by version string. When a request includes aschema_version in its metadata, Permify looks up that version in the in-memory cache first. If it is not found, Permify queries the database, stores the result in the cache, and serves it. If no schema_version is provided, Permify treats versions as alphanumeric, sorts them in that order, and fetches the head (latest) version — checking the cache first.
Configure the schema cache size in your Permify configuration:
Permission (data) cache
Permify applies MVCC (Multi-Version Concurrency Control) for Postgres. Every write and delete operation creates a new database snapshot, which both improves performance and produces a naturally consistent cache. The permission cache key encodes the tenant, schema version, snapshot token, and the full check request:The MVCC pattern also enables historical data storage. However, it accumulates old relationship rows over time. Permify includes a garbage collector that removes outdated data at a configurable interval.
Cache sizing and eviction
There is no separate dedicated cache for snap tokens. The snap token is part of the permission cache key, so the samepermission.cache settings govern how many snap-token-keyed entries reside in memory:
| Config key | Purpose |
|---|---|
service.permission.cache.max_cost | Maximum memory budget (e.g. 10MiB, 256MiB). This is the effective size limit for all snap-token-keyed entries. |
service.permission.cache.number_of_counters | Number of TinyLFU admission counters. A good rule of thumb is ~10× the expected number of unique cached items. |
max_cost, using Ristretto’s TinyLFU admission policy combined with a SampledLFU eviction policy. Entries are evicted when new items need space and the budget is exhausted — not after a fixed time window.
If you observe high cache miss rates after a schema version change, this is expected behaviour. The
schema_version component of the cache key changes, making all prior entries stale. Size your max_cost to hold a comfortable working set for the most recently active schema version.Distributed cache
When you run multiple Permify instances, Permify activates consistent hashing across instances to make efficient use of their individual in-memory caches. Consistent hashing distributes cache keys across nodes independently of the total number of nodes. When a request arrives at any instance, the consistent hash ring determines which instance owns that key. Subsequent requests with the same hash are routed to the same instance, maximising cache hit rates and acting as a natural load balancer.Single-instance behaviour
With one Permify instance, every API request stores its result in the local in-memory cache and serves future identical requests from there.Multi-instance behaviour
With more than one instance, consistent hashing activates on API calls. Suppose a check result is stored on instance 2 — all subsequent requests with the same hash are routed to instance 2 regardless of which instance received the original call. Adding more instances automatically increases total cache capacity. Learn more: Introducing Consistent HashingEnabling distributed mode
Consistent hashing distributes keys evenly across cache nodes, but it is the application’s responsibility to ensure the cache is used effectively — reading from and writing to it appropriately.
Scaling events: adding or removing pods
When you scale out or scale in in Kubernetes, the following happens at the cache level: Key rebalancing is partial, not global. The consistent hash ring updates and only the key ranges that mapped to the affected pod need to move. The rest of the ring — and its cached entries — is undisturbed. Each pod’s cache is local and in-memory. Permify uses Ristretto as a process-local cache; there is no shared cache layer.- Scale-out (new pod joins): The new pod starts with a cold cache. Requests routed to it will miss and fall through to the database until the cache warms up. Expect a temporary increase in database load and latency after adding a pod.
- Scale-in (pod removed): All entries cached in that pod are lost. The key range is reassigned to a remaining pod, which will see cold-cache behaviour for those keys until they warm up.
max_cost budget and request rate.
Permify also uses a circuit breaker pattern to detect and handle failures when the underlying database is unavailable, preventing unnecessary calls during outages and managing the reboot phase gracefully.