What is Capacity Reservation?
Capacity reservation allows you to “book” rate limit capacity for future use. When you reserve capacity, you receive a retryAfter time at which you can execute your operation without re-checking the rate limit.
This prevents starvation on larger requests and enables fair queueing of operations.
The Problem: Starvation
Without reservations, large requests can be repeatedly blocked:
// Without reservation: this might fail repeatedly
const status = await rateLimiter.limit(ctx, "llmTokens", { count: 1000 });
if (!status.ok) {
// By the time we retry, capacity might be consumed by other requests
return { error: "Rate limited" };
}
Smaller requests can continuously consume available capacity, preventing large requests from ever succeeding.
The Solution: Reserve Capacity
With the reserve parameter, you can guarantee future execution:
const status = await rateLimiter.limit(ctx, "llmTokens", {
count: 1000,
reserve: true,
});
if (status.retryAfter) {
// Capacity is reserved! We can run at this exact time
await ctx.scheduler.runAfter(status.retryAfter, internal.ai.processRequest, {
skipCheck: true, // We've already reserved capacity
});
}
How Reservations Work
- Check available capacity: The rate limiter checks if there’s enough capacity now or in the future
- Reserve tokens: If not immediately available, it reserves capacity at a future time
- Return retryAfter: You receive the exact time when your operation can run
- Execute without re-checking: At that time, skip the rate limit check
Complete Pattern with Scheduler
Here’s the recommended pattern for using reservations with ctx.scheduler:
import { internalAction } from "./_generated/server";
import { internal } from "./_generated/api";
import { v } from "convex/values";
export const processLLMRequest = internalAction({
args: {
prompt: v.string(),
tokens: v.number(),
skipCheck: v.optional(v.boolean()),
},
handler: async (ctx, args) => {
if (!args.skipCheck) {
// Reserve future capacity instead of just failing now
const status = await rateLimiter.limit(ctx, "llmRequests", {
count: args.tokens,
reserve: true,
});
if (status.retryAfter) {
// Schedule for future execution with reserved capacity
await ctx.scheduler.runAfter(
status.retryAfter,
internal.ai.processLLMRequest,
{
...args,
// When we run in the future, skip the rate limit check
// since we've just reserved that capacity
skipCheck: true,
},
);
return { scheduled: true, executeAt: Date.now() + status.retryAfter };
}
}
// Either we had immediate capacity, or this is our reserved execution
const result = await callLLMAPI(args.prompt);
return { result };
},
});
The skipCheck Pattern
The skipCheck parameter is crucial:
args: {
skipCheck: v.optional(v.boolean()),
}
if (!args.skipCheck) {
// Only check rate limit if this isn't a reserved execution
const status = await rateLimiter.limit(ctx, "name", { reserve: true });
// ...
}
Important: Always use skipCheck for scheduled executions after reserving capacity. Otherwise, you’ll check the rate limit twice and consume double the tokens.
Preventing Starvation
Reservations ensure fairness by queueing operations:
// Three requests arrive at the same time
// Request 1: 100 tokens
const r1 = await rateLimiter.limit(ctx, "api", { count: 100, reserve: true });
// executes immediately (ok: true, retryAfter: 0)
// Request 2: 100 tokens
const r2 = await rateLimiter.limit(ctx, "api", { count: 100, reserve: true });
// scheduled for future (ok: true, retryAfter: 1000)
// Request 3: 100 tokens
const r3 = await rateLimiter.limit(ctx, "api", { count: 100, reserve: true });
// scheduled further out (ok: true, retryAfter: 2000)
Each request gets a guaranteed execution time, preventing starvation.
Maximum Reservations
You can limit how far ahead capacity can be reserved:
const config = {
kind: "token bucket",
rate: 1000,
period: MINUTE,
maxReserved: 2000, // Can reserve up to 2x the rate
};
When maxReserved is exceeded, ok will be false:
const status = await rateLimiter.limit(ctx, "api", {
count: 5000,
reserve: true,
});
if (!status.ok) {
// maxReserved exceeded - reject the request
throw new Error("Request too large or queue too full");
}
Use Cases
1. Large Batch Operations
export const processBatch = internalAction({
args: { items: v.array(v.any()), skipCheck: v.optional(v.boolean()) },
handler: async (ctx, args) => {
if (!args.skipCheck) {
const status = await rateLimiter.limit(ctx, "batchAPI", {
count: args.items.length,
reserve: true,
});
if (status.retryAfter) {
await ctx.scheduler.runAfter(status.retryAfter, internal.batch.processBatch, {
...args,
skipCheck: true,
});
return;
}
}
// Process the batch
},
});
2. Fair Queueing
export const queuedOperation = mutation({
args: { userId: v.id("users"), data: v.any() },
handler: async (ctx, args) => {
const status = await rateLimiter.limit(ctx, "userOperations", {
key: args.userId,
reserve: true,
});
if (status.retryAfter) {
await ctx.scheduler.runAfter(
status.retryAfter,
internal.operations.executeOperation,
{ userId: args.userId, data: args.data, skipCheck: true },
);
return { queued: true, executeAt: Date.now() + status.retryAfter };
}
return { immediate: true };
},
});
3. LLM API Rate Limiting
export const generateText = internalAction({
args: {
prompt: v.string(),
estimatedTokens: v.number(),
skipCheck: v.optional(v.boolean()),
},
handler: async (ctx, args) => {
if (!args.skipCheck) {
const status = await rateLimiter.limit(ctx, "llmTokens", {
count: args.estimatedTokens,
reserve: true,
});
if (status.retryAfter) {
await ctx.scheduler.runAfter(
status.retryAfter,
internal.llm.generateText,
{ ...args, skipCheck: true },
);
return { status: "scheduled", executeAt: Date.now() + status.retryAfter };
}
}
const result = await callOpenAI(args.prompt);
return { status: "complete", result };
},
});
Best Practices
- Always use skipCheck: When executing reserved capacity, always pass
skipCheck: true
- Estimate conservatively: Reserve slightly more capacity than you think you’ll need
- Set maxReserved: Prevent unbounded queueing with
maxReserved
- Handle scheduling failures: Check if scheduling succeeded and handle errors
- Use with actions: Reservations work best with
internalAction for async operations
Reservations are particularly useful for preventing thundering herd problems. See the Jitter guide for more techniques to handle burst traffic.