Skip to main content
All data in RadishDB lives in a single in-memory hash table. The hash table uses separate chaining to resolve collisions and automatically resizes when it grows too dense. Every key and value is a heap-allocated C string; the hash table owns these allocations.

Data structures

Entry

Each key-value pair is stored in an Entry node:
hashtable.h
typedef struct Entry {
  char    *key;         // heap-allocated (strdup)
  char    *value;       // heap-allocated (strdup)
  time_t   expires_at;  // 0 = no expiry, otherwise Unix timestamp
  struct Entry *next;   // next entry in the same bucket (chaining)
} Entry;
expires_at == 0 is the sentinel for “this key never expires”. Any non-zero value is an absolute Unix timestamp; the key is considered expired once time(NULL) >= expires_at.

HashTable

hashtable.h
typedef struct HashTable {
  Entry  **buckets;  // array of Entry* pointers
  size_t   size;     // number of buckets (always a power of two)
  int      count;    // number of live entries
  int      resizes;  // total number of resize operations performed
} HashTable;
The size field tracks the bucket array length. count tracks live entries and drives the load factor calculation. resizes is a monotonic counter exposed by the INFO command.

Hash function

RadishDB uses a djb2 variant — a multiply-and-add hash that distributes ASCII keys well in practice:
hashtable.c
unsigned long hash(const char *str) {
  unsigned long hash = 5381;
  for (int i = 0; str[i] != '\0'; i++) {
    hash = hash * 33 + str[i];
  }
  return hash;
}
The bucket index is hash(key) % ht->size. Because size is always a power of two after the first resize, this modulo is computed efficiently.
djb2 is not cryptographic. Key distribution is good for typical workloads, but a carefully crafted set of keys can cause hash collisions. RadishDB does not implement hash randomization.

Separate chaining

When two keys map to the same bucket, they form a singly-linked list through the next pointer. Lookup walks the chain comparing keys with strcmp until it finds a match or reaches the end.
buckets[i] → Entry{"apple", "red",  0, →} → Entry{"ant", "6", 0, NULL}
buckets[j] → Entry{"banana", "yellow", 1700000000, NULL}
buckets[k] → NULL
In the average case — low load, few collisions — lookup is O(1). Worst case is O(n) if all keys hash to the same bucket, but this does not occur in practice because of the load factor threshold.

Auto-resize at 0.75 load factor

Before every insert, ht_set computes the current load factor:
hashtable.c
void ht_set(HashTable *ht, const char *key, const char *value, time_t expires_at) {
  float load = (float)ht->count / ht->size;
  if (load > 0.75f) {
    ht_resize(ht, ht->size * 2);  // double the bucket count
  }
  // ... insert with separate chaining
}
When the load factor exceeds 0.75, ht_resize allocates a new bucket array twice as large, re-hashes every existing entry into the new array, and frees the old array. The resizes counter increments with each operation. This keeps the average chain length short (below 1.0) and lookup time close to O(1) regardless of database size.

Memory management

All strings are heap-allocated via strdup at insert time. The hash table owns these allocations:
  • On SET: new key and value strings are strdup’d into the entry.
  • On overwrite: the old value string is free’d before the new one is stored.
  • On DEL or expiry: both key and value are free’d and the Entry itself is free’d.
  • On ht_resize: entries are re-linked into new buckets — the strings themselves are not re-allocated.
Callers must never hold a raw pointer to a value string across a SET or DEL that targets the same key. The underlying allocation may be freed.

Complexity

OperationAverageWorst case
SETO(1) amortizedO(n) during resize
GETO(1)O(n) — full chain scan
DELO(1)O(n) — full chain scan
TTLO(1)O(n) — full chain scan
ResizeO(n)O(n)
Amortized O(1) for SET holds because resize work is spread across many insertions. Resize itself is O(n) but happens at most O(log n) times over n total inserts.

INFO metrics

The INFO command exposes the following hash table statistics:
FieldDescription
keysTotal number of live entries
keys_with_ttlEntries that have a non-zero expires_at
expired_keysCumulative count of keys deleted by expiry
bucketsCurrent bucket array size
load_factorcount / size as a float
resizesTotal number of resize operations
max_chainLength of the longest chain in the current table
>>> INFO
Version : 0.1
Uptime Seconds :142

Keys : 12
Keys with TTL : 3
Expired keys : 7

Buckets : 16
Load Factor : 0.750000
Resizes : 2
Max chain : 2

Build docs developers (and LLMs) love