Understanding Read Dependencies
When working with the Convex Aggregate component, it’s important to understand how read operations create dependencies that can affect write performance and cause OCC (Optimistic Concurrency Control) conflicts.How Read Dependencies Work
The Aggregate component stores denormalized counts in an internal B-tree data structure to achieveO(log(n)) time complexity. Data points with nearby keys may have their counts accumulated in shared internal nodes.
Data points with adjacent keys often share internal nodes in the aggregate tree, which means operations on these points can interfere with each other.
Impact on Queries
When a query callsawait aggregate.count(ctx), it depends on the entire aggregate data structure. This has important implications:
- Reactivity: When any mutation changes the data structure via
insert,delete, orreplace, the query automatically reruns and sends new results to the frontend - Function call usage: Frequent updates can cause large function call and bandwidth usage on Convex
- Spurious reruns: Queries may rerun even when their results don’t change
Example: Adjacent Keys
Imagine a leaderboard aggregate withKey: [username, score]. Users “Laura” and “Lauren” have adjacent keys, so their counts are accumulated in a shared internal node.
- When Laura queries her own high score, she reads from the internal node shared with Lauren
- When Lauren gets a new high score, Laura’s query reruns (even though her result doesn’t change)
- The shared internal node creates a dependency between these two users
Impact on Mutations
When a mutation callsawait aggregate.count(ctx), it needs to run transactionally relative to other mutations. Another mutation performing an insert, delete, or replace can cause an OCC conflict.
Sequential Key Problem
A particularly problematic pattern occurs when using sequential keys like_creationTime:
- Each new data point is added to the same part of the data structure (the end)
- Since
_creationTimekeeps increasing, all inserts target the same internal nodes - All inserts wait for each other, preventing parallel execution
- No mutations can run concurrently
Namespacing as a Solution
Namespaces provide isolation by giving each namespace its own data structure:- Each namespace has its own data structure with no overlap in internal nodes
- “Laura” and “Lauren” never have contention, even with similar usernames
- Writes to different namespaces can execute in parallel
- Queries on one namespace don’t rerun when other namespaces change
Use namespacing when you have natural partitions in your data and don’t need to aggregate across those partitions.
Trade-offs: Namespacing vs. Bounds
| Approach | Pros | Cons |
|---|---|---|
| Namespace-based partitioning | No write contention between namespaces; Maximum write throughput | Cannot aggregate across namespaces; Must always specify namespace |
| Bounds-based filtering | Can aggregate globally; Flexible querying | Write contention for nearby keys; May need careful key design |
Best Practices
- Choose the right key structure: Consider how your queries and writes will interact when designing your sort keys
- Use namespaces for high-throughput writes: When data is naturally partitioned and you don’t need global aggregation
- Profile your workload: Monitor OCC conflicts and query reruns to identify problematic patterns
- Combine strategies: Use namespaces for the primary partition and bounds for secondary filtering
- Isolation between different games (via namespace)
- Ability to filter by user within a game (via bounds)
- Good write throughput as different games don’t interfere
See Also
- Lazy Aggregation - Reduce write contention with lazy root nodes
- Optimizing Throughput - Strategies for maximizing write performance