Overview
Indexes serve two critical purposes:- Fast blob lookup: Map blob ID → pack file + offset
- Deduplication: Check if blob already exists before writing
- Index files stored in the backend (
/index/<ID>) - In-memory index for fast lookups during operations
Index Architecture
IndexFile Structure
Stored in the backend as JSON:IndexPack
Describes one pack file:IndexBlob
Describes one blob within a pack:Indexer
TheIndexer accumulates blob information and writes index files:
Auto-saving
The indexer automatically saves when:- 50,000 blobs indexed (MAX_COUNT)
- 5 minutes elapsed (MAX_AGE)
In-Memory Index
For fast lookups, indexes are loaded into memory:IndexedBackend Trait
Building Indexes
From Scratch
Scan all packs and rebuild the index:Incrementally
Add new packs to existing index:Index Operations
Checking Blob Existence
Listing All Blobs
Index Optimization
Superseding Indices
New index can replace multiple old ones:Packs to Delete
Mark packs for deletion during prune:Shared Indexer
For multi-threaded operations:Binary Sorted Index
Internal implementation for fast lookups:Index Performance
Memory usage:- ~40 bytes per indexed blob
- 1M blobs ≈ 40 MiB memory
- O(log n) for sorted arrays
- O(1) for hash maps
- ~50 bytes per blob (JSON)
- Compressed when stored
When to Rebuild
Rebuild the index when:- Index files are corrupted or missing
- After manual pack deletion
- Index becomes fragmented (many files)
See Also
- Repository Files - IndexFile format
- Blob Types - Blob structure and locations
- Repository - Index usage in operations