Overview
The database uses a layered architecture where each component has a specific responsibility:Core components
Database
TheDatabase struct (src/core/database.rs:154) is the top-level coordinator that manages all subsystems:
- Initialize and coordinate all subsystems
- Manage collection metadata (schemas, indexes, roots)
- Provide transaction lifecycle management
- Execute automatic checkpointing
- Track database-wide metrics
The database uses Arc (atomic reference counting) to share components safely across threads, enabling concurrent read transactions.
Pager
The pager (src/core/pager.rs:145) manages page-level storage with an LRU cache:- 4KB pages - Standard page size for efficient I/O
- LRU caching - 25,000 page default (~100MB)
- Free list management - Reuses deleted pages
- Corruption detection - Validates header on open (src/core/pager.rs:222)
Write-Ahead Log (WAL)
The WAL (src/core/wal.rs:52) ensures durability through append-only logging:- Before commit: Write modified pages to WAL with checksums
- fsync WAL: Guarantee durability to disk
- Update database: Write pages to main file
- Checkpoint: Merge WAL into main file, truncate WAL
- Crashes during commit (replay from WAL on recovery)
- Torn writes (checksums detect corruption)
- Incomplete transactions (salt values prevent stale frame replay)
B-tree Storage Engine
Documents are stored in copy-on-write B-trees (src/core/btree.rs:84):- When a transaction modifies a page, it allocates a new page instead of overwriting
- Old pages remain untouched for concurrent readers
- On commit, the new root is atomically swapped
- Supports O(log n) document lookup, insert, and deletion
Why copy-on-write?
Why copy-on-write?
CoW enables snapshot isolation without locks:
- Readers use the old root pointer → never blocked by writers
- Writers build new tree versions → never blocked by readers
- Atomic commits → just swap root pointer
Data flow: Insert operation
Here’s how a document insert flows through the system:-
Begin transaction (src/core/transaction.rs:56)
- Allocate MVCC transaction ID (xmin)
- Capture snapshot ID (latest committed tx)
- Take snapshot of all collection B-tree roots
-
Insert document (src/core/tx_collection.rs)
- Allocate new page for document
- Write versioned document with xmin = current tx
- Insert into CoW B-tree (may trigger splits)
- Track document write in transaction
-
Commit transaction (src/core/transaction.rs:445)
- Conflict detection: Check if other transactions modified same documents
- Acquire commit lock: Serialize commits
- Write to WAL: Log all page changes with checksums
- Write to pager: Update pages in cache
- Sync WAL: fsync() for durability
- Update metadata: New B-tree roots
- Mark committed: Update MVCC manager
The entire commit executes in ~8ms with fsync enabled (see README.md:64). The WAL and batch commit optimizations are critical for this performance.
Concurrency model
jasonisnthappy uses optimistic concurrency control:- Multiple readers: Share the same snapshot, never block each other
- Multiple writers: Build independent tree versions, serialize at commit
- Read + Write: Readers use old snapshot, writers create new version
- Check if documents modified by this transaction were also changed by a committed transaction
- If conflict detected → rollback and retry (automatic with
run_transaction()) - If no conflict → commit succeeds
File layout
A jasonisnthappy database consists of two files:- Magic number validation (
JSIN) - Version and page size
- Total page count
- Metadata page pointer
- Free page list
- Next transaction ID
Performance characteristics
From the README benchmarks:- Write throughput: ~19,150 docs/sec (1000 docs per transaction)
- Read latency: 0.009ms @ 16 threads (MVCC snapshot isolation)
- Query speed: Sub-millisecond even on 2500+ documents
- Concurrent writes: Linear scaling up to core count
- LRU page cache - 100MB default reduces disk I/O
- Batch commits - Groups up to 32 transactions into single fsync
- Sequential WAL writes - Append-only for minimal seek overhead
- Batched checkpoints - Writes consecutive pages in single syscall
- Per-database buffer pools - Reduces allocations for B-tree operations
Thread safety
All components use fine-grained locking:Arc<RwLock<T>>for read-heavy state (metadata, MVCC info)Arc<Mutex<T>>for exclusive access (file I/O, commit serialization)- Lock-free atomics for counters (transaction IDs, metrics)
Next steps
Transactions
Learn how ACID transactions work with commit/rollback
MVCC
Understand snapshot isolation and version management
Storage Engine
Deep dive into B-tree internals and copy-on-write