SET records, but only the last one matters for the current state. Log compaction discards this history and replaces the AOF with a minimal representation of what the database looks like right now.
Why compaction is needed
Consider a key that is updated frequently:SET commands to produce one key with value 1003. After compaction:
Bounded disk usage
The AOF cannot grow without limit. After compaction, file size is proportional to the number of live keys, not the total write volume.
Faster startup
A smaller AOF replays faster. A database with 10,000 keys but 1,000,000 historical writes restarts in milliseconds after compaction, not seconds.
TTL cleanup
Compaction skips expired keys. They are never written to the new AOF, so they do not consume replay time on restart.
Crash safety
The rewrite is atomic: it writes to a
.tmp file first, then renames into place. A crash during rewrite leaves the old AOF intact.The rewrite algorithm
aof_rewrite produces a new AOF by serializing the current in-memory state:
aof.c
- Only non-expired entries are written. Dead keys vanish from the log.
- TTL keys write remaining seconds (
expires_at - now), not the original TTL. This preserves the correct expiry on replay. fsyncis called beforerenameto ensure the new file is fully written to disk before it becomes visible.renameis atomic on POSIX systems — a reader sees either the old file or the new file, never a partially-written intermediate.
AOFX1 header and base_size
The AOFX1 header written byaof_rewrite contains a base_size field:
base_size records the file size immediately after the last compaction. It serves as a watermark: any bytes beyond base_size are new mutations written since the last rewrite.
At startup and during the REPL loop, RadishDB reads base_size from the header:
repl.c
base_size is zero (the file was never rewritten, or the header is absent), the entire file is treated as growth since the last compaction.
Trigger condition
Compaction fires automatically when the AOF has grown to more than 2× its post-compaction size:repl.c
2× ratio is hardcoded. It means RadishDB tolerates up to 100% write amplification before compacting. A more write-heavy workload compacts more frequently; a read-heavy workload may never compact at all.
Startup compaction
Compaction is also evaluated at startup, before the first prompt is displayed:- If
aof_size > aof_base_size * 2, a rewrite runs immediately. - If the AOF has no AOFX1 header (old format or first run after upgrade),
base_sizeis treated as zero, which always triggers a rewrite.
Rewrite output vs old AOF: a comparison
Before rewrite — raw AOF with history:temp is gone (it was deleted). session’s TTL reflects how many seconds remain, not how many were originally set. On replay, the hash table is identical to what it was at rewrite time.
Effect on startup time
Startup time scales with the number of records in the AOF, not the total write volume. Without compaction:- 1M writes to 1K unique keys = 1M replay operations
- Startup time is proportional to write volume
- The AOF is rewritten whenever it doubles in size
- After rewrite, AOF contains at most
current_key_countrecords - Startup time is proportional to current key count
RadishDB performs AOF rewrite in-process on the main thread — there is no background fork. Unlike Redis, which uses
fork() + copy-on-write to rewrite in a child process, RadishDB’s rewrite briefly blocks command processing while it iterates the hash table. For datasets that fit comfortably in memory, this typically completes in milliseconds.Summary
| Property | Value |
|---|---|
| Trigger condition | aof_size > aof_base_size * 2 |
| Trigger frequency | Checked on every REPL iteration |
| Growth ratio tolerated | 2× (100% write amplification) |
| Temp file | aof/radish.aof.tmp |
| Replacement method | rename() — atomic on POSIX |
| Expired keys in output | Excluded |
| TTL in output | Remaining seconds at time of rewrite |
| Background process | None — runs on the main thread |