Overview
Thestorage crate (crates/storage/) defines the storage abstraction, while backend-specific crates implement it:
sqlite- SQLite backend (default for self-hosted)postgres- PostgreSQL backendmysql- MySQL/MariaDB backend
Storage abstraction
Persistence trait
The core abstraction:Design principles
- Backend agnostic: Application code doesn’t depend on specific backend
- Async first: All operations are async for scalability
- Stream-based: Large results stream to avoid memory bloat
- Transactional: ACID guarantees at the storage layer
Key-value model
Data layout
Convex uses a key-value storage model:Key encoding
Keys are carefully encoded for:- Lexicographic ordering
- Efficient range scans
- Namespace isolation
- Index co-location
Value serialization
Values are serialized using:- Protocol Buffers for wire format
- FlexBuffers for document storage
- Compression for large values
Transaction support
Snapshot isolation
All backends provide snapshot isolation:- Readers see consistent snapshot
- Writers don’t block readers
- Conflicts detected at commit time
- Serializable isolation level
Transaction lifecycle
- Begin: Acquire transaction ID and snapshot
- Execute: Reads see consistent snapshot, writes are buffered
- Validate: Check for conflicts with concurrent transactions
- Commit: Apply writes atomically or rollback
Conflict detection
Conflicts occur when:- Two transactions modify the same key
- Read-write conflicts in serializable mode
- Schema changes conflict with queries
SQLite backend
Implementation
Path:crates/sqlite/
Using rusqlite for:
- Embedded database
- Local file storage
- Simple deployment
- Great for development and small deployments
Schema
Simple key-value table:Transaction handling
SQLite transactions:Performance tuning
Optimizations:Limitations
- Single-writer concurrency
- File-based storage limits scale
- Not suitable for distributed deployments
Use cases
- Local development
- Self-hosted small deployments
- Testing and CI
- Edge deployments
PostgreSQL backend
Implementation
Path:crates/postgres/
Using tokio-postgres for:
- Async operations
- Connection pooling
- Prepared statements
- Scalable deployments
Schema
Optimized for Postgres:Connection pooling
Managed connection pool:Transaction isolation
Postgres serializable transactions:Performance features
- Prepared statement caching
- Connection pooling
- Async I/O throughout
- Efficient batch operations
Configuration
Connection string:Scaling considerations
- Horizontal read scaling with replicas
- Connection pooling for high concurrency
- Partitioning for large datasets
- Vacuum and maintenance required
MySQL backend
Implementation
Path:crates/mysql/
Using mysql_async for:
- Async MySQL operations
- Compatible with MySQL and MariaDB
- Wide deployment support
Schema
MySQL-optimized schema:Transaction handling
InnoDB transactions:Connection management
Connection pool configuration:Performance tuning
InnoDB settings:Storage selection
Choosing a backend
Decision factors:| Backend | Best For | Deployment |
|---|---|---|
| SQLite | Development, small apps | Single server |
| Postgres | Production, scalability | Managed DB or cloud |
| MySQL | Existing infrastructure | Managed DB or cloud |
Configuration
Backend is selected via connection string:Persistence layer features
Document versioning
Each write includes:- Timestamp of the write
- Transaction ID
- Document version
Garbage collection
Old versions are cleaned up:- Configurable retention period
- Background GC process
- Doesn’t block reads/writes
Backup and restore
Supported operations:- Point-in-time snapshots
- Export to JSON or CSV
- Import from external sources
- Streaming export for large datasets
Point-in-time queries
Query historical data:Performance characteristics
Read performance
- Single key lookup: Sub-millisecond
- Range scan: Streaming, bounded by network
- Index scan: Optimized with database indexes
Write performance
- Single write: Milliseconds
- Batch writes: More efficient per-item
- Transaction commit: Durable write to disk
Scalability limits
Typical limits:- SQLite: ~100 concurrent readers, 1 writer
- Postgres: 1000s of connections with pooling
- MySQL: Similar to Postgres
Monitoring and observability
Metrics
Tracked metrics:- Transaction latency
- Query performance
- Connection pool utilization
- Storage size growth
- Read/write throughput
Health checks
Persistence health:Debugging
Slow query logging:- Queries exceeding threshold are logged
- Stack traces for investigation
- Query plan analysis
Data integrity
Checksums
Data corruption detection:- Checksums for stored values
- Verification on read
- Automatic repair or error reporting
Durability guarantees
All backends ensure:- Writes survive process crashes
- ACID compliance
- No partial writes visible
Consistency verification
Background verification:- Index consistency checks
- Referential integrity
- Schema compliance
Testing
Backend tests
Each backend has comprehensive tests:Consistency tests
Verify ACID properties:- Atomicity: All-or-nothing commits
- Consistency: Constraints are maintained
- Isolation: Concurrent transactions don’t interfere
- Durability: Committed data persists
Performance benchmarks
Benchmark suite:Migration and upgrades
Schema migrations
Managed migrations:Data migration
Moving between backends:- Export from source backend
- Transform data if needed
- Import to target backend
- Verify data integrity
- Switch traffic
Version compatibility
Backward compatibility:- Old versions can read new format
- Graceful handling of unknown fields
- Migration path documented
Next steps
- Database engine component - Layer built on persistence
- Indexing system - Index storage and management
- Rust backend architecture - Overall architecture