Skip to main content

Overview

By default, each call to limit() consumes exactly 1 token. But many use cases require consuming different amounts based on the operation being performed. The count parameter lets you consume multiple tokens in a single call.

Using the count Parameter

const status = await rateLimiter.limit(ctx, "llmTokens", { 
  count: tokens 
});

Example: LLM Token Consumption

When calling an LLM API, you want to limit based on tokens consumed, not number of requests:
import { RateLimiter, MINUTE } from "@convex-dev/rate-limiter";
import { components } from "./_generated/api";
import { action } from "./_generated/server";

const rateLimiter = new RateLimiter(components.rateLimiter, {
  // Allow 40,000 tokens per minute across all requests
  llmTokens: { kind: "token bucket", rate: 40000, period: MINUTE, shards: 10 },
});

export const generateText = action({
  args: { prompt: v.string() },
  handler: async (ctx, args) => {
    // Estimate token count (4 chars ≈ 1 token)
    const estimatedTokens = Math.ceil(args.prompt.length / 4);
    
    // Check if we have enough token quota
    const status = await rateLimiter.limit(ctx, "llmTokens", { 
      count: estimatedTokens,
      throws: true,
    });
    
    // Call the LLM API
    const response = await openai.chat.completions.create({
      model: "gpt-4",
      messages: [{ role: "user", content: args.prompt }],
    });
    
    return response.choices[0].message.content;
  },
});
From the README: “Consume multiple in one request to prevent rate limits on an LLM API.”

Example: File Size Limits

Rate limit file uploads based on file size rather than number of uploads:
const rateLimiter = new RateLimiter(components.rateLimiter, {
  // Allow 100MB per hour per user
  uploadBandwidth: { kind: "token bucket", rate: 100_000_000, period: HOUR },
});

export const uploadFile = mutation({
  args: { 
    userId: v.string(),
    fileSizeBytes: v.number(),
  },
  handler: async (ctx, args) => {
    const { ok, retryAfter } = await rateLimiter.limit(ctx, "uploadBandwidth", {
      key: args.userId,
      count: args.fileSizeBytes,
    });
    
    if (!ok) {
      throw new Error(`Upload quota exceeded. Try again in ${Math.ceil(retryAfter! / 1000)}s`);
    }
    
    // Process the file upload
  },
});

Example: Batch Operations

Consume tokens proportional to batch size:
const rateLimiter = new RateLimiter(components.rateLimiter, {
  batchInsert: { kind: "token bucket", rate: 1000, period: MINUTE },
});

export const insertDocuments = mutation({
  args: { documents: v.array(v.object({ name: v.string() })) },
  handler: async (ctx, args) => {
    // Consume tokens based on batch size
    await rateLimiter.limit(ctx, "batchInsert", {
      count: args.documents.length,
      throws: true,
    });
    
    // Insert all documents
    for (const doc of args.documents) {
      await ctx.db.insert("documents", doc);
    }
  },
});

Real Example from Source Code

From example/convex/example.ts:
export const consumeTokens = mutation({
  args: {
    count: v.optional(v.number()),
  },
  handler: async (ctx, args) => {
    const user = await ctx.auth.getUserIdentity();
    const key = user?.subject ?? "anonymous";
    
    return rateLimiter.limit(ctx, "demoLimit", {
      count: args.count || 1,
      key,
    });
  },
});

When to Use Custom Counts

Variable Cost Operations

When different requests have different “costs”:
  • LLM API calls (token usage)
  • Image generation (resolution/quality)
  • Database queries (complexity)

Resource Consumption

When limiting based on resource usage:
  • File upload bandwidth
  • Storage space
  • API credits

Batch Operations

When processing multiple items:
  • Bulk inserts
  • Batch exports
  • Multiple file uploads

Tiered Usage

When requests have different weights:
  • Premium vs free features
  • Expensive vs cheap operations
  • Priority queues

Combining with Per-User Limits

Custom counts work perfectly with per-user rate limiting:
const rateLimiter = new RateLimiter(components.rateLimiter, {
  // Each user gets 40,000 tokens per minute
  llmTokens: { kind: "token bucket", rate: 40000, period: MINUTE },
});

export const chat = action({
  args: { 
    userId: v.string(),
    message: v.string(),
  },
  handler: async (ctx, args) => {
    const tokenCount = estimateTokens(args.message);
    
    // Per-user token limit
    await rateLimiter.limit(ctx, "llmTokens", {
      key: args.userId,
      count: tokenCount,
      throws: true,
    });
    
    // Make API call
  },
});

Fractional Counts

Counts can be fractional (floating-point numbers):
// Consume 0.5 tokens for lightweight operations
await rateLimiter.limit(ctx, "apiRequest", { count: 0.5 });

// Consume 2.5 tokens for medium operations
await rateLimiter.limit(ctx, "apiRequest", { count: 2.5 });

Best Practices

When estimating costs (like LLM tokens), err on the side of overestimating to avoid hitting external API limits:
// Add 20% buffer for safety
const estimatedTokens = Math.ceil(prompt.length / 4 * 1.2);
Match your rate limits to the metric you’re counting:
// For tokens: higher rate, longer period
llmTokens: { rate: 40000, period: MINUTE }

// For bytes: very high rate
uploadBandwidth: { rate: 100_000_000, period: HOUR }

// For count of items: moderate rate
batchOperations: { rate: 1000, period: MINUTE }
If the count comes from user input, validate it:
if (args.count < 1 || args.count > 10000) {
  throw new Error("Invalid count");
}
For expensive operations, check availability first:
// Check if we have enough tokens
const check = await rateLimiter.check(ctx, "llmTokens", { 
  count: estimatedTokens 
});

if (!check.ok) {
  return { error: "Insufficient quota", retryAfter: check.retryAfter };
}

// Now consume and proceed
await rateLimiter.limit(ctx, "llmTokens", { 
  count: estimatedTokens,
  throws: true,
});

Next Steps

Build docs developers (and LLMs) love