Custom Counts

Overview

By default, each call to limit() consumes exactly 1 token. But many use cases require consuming different amounts based on the operation being performed. The count parameter lets you consume multiple tokens in a single call.

Using the count Parameter

const status = await rateLimiter.limit(ctx, "llmTokens", { 
  count: tokens 
});

Example: LLM Token Consumption

When calling an LLM API, you want to limit based on tokens consumed, not number of requests:

import { RateLimiter, MINUTE } from "@convex-dev/rate-limiter";
import { components } from "./_generated/api";
import { action } from "./_generated/server";

const rateLimiter = new RateLimiter(components.rateLimiter, {
  // Allow 40,000 tokens per minute across all requests
  llmTokens: { kind: "token bucket", rate: 40000, period: MINUTE, shards: 10 },
});

export const generateText = action({
  args: { prompt: v.string() },
  handler: async (ctx, args) => {
    // Estimate token count (4 chars ≈ 1 token)
    const estimatedTokens = Math.ceil(args.prompt.length / 4);
    
    // Check if we have enough token quota
    const status = await rateLimiter.limit(ctx, "llmTokens", { 
      count: estimatedTokens,
      throws: true,
    });
    
    // Call the LLM API
    const response = await openai.chat.completions.create({
      model: "gpt-4",
      messages: [{ role: "user", content: args.prompt }],
    });
    
    return response.choices[0].message.content;
  },
});

From the README: “Consume multiple in one request to prevent rate limits on an LLM API.”

Example: File Size Limits

Rate limit file uploads based on file size rather than number of uploads:

const rateLimiter = new RateLimiter(components.rateLimiter, {
  // Allow 100MB per hour per user
  uploadBandwidth: { kind: "token bucket", rate: 100_000_000, period: HOUR },
});

export const uploadFile = mutation({
  args: { 
    userId: v.string(),
    fileSizeBytes: v.number(),
  },
  handler: async (ctx, args) => {
    const { ok, retryAfter } = await rateLimiter.limit(ctx, "uploadBandwidth", {
      key: args.userId,
      count: args.fileSizeBytes,
    });
    
    if (!ok) {
      throw new Error(`Upload quota exceeded. Try again in ${Math.ceil(retryAfter! / 1000)}s`);
    }
    
    // Process the file upload
  },
});

Example: Batch Operations

Consume tokens proportional to batch size:

const rateLimiter = new RateLimiter(components.rateLimiter, {
  batchInsert: { kind: "token bucket", rate: 1000, period: MINUTE },
});

export const insertDocuments = mutation({
  args: { documents: v.array(v.object({ name: v.string() })) },
  handler: async (ctx, args) => {
    // Consume tokens based on batch size
    await rateLimiter.limit(ctx, "batchInsert", {
      count: args.documents.length,
      throws: true,
    });
    
    // Insert all documents
    for (const doc of args.documents) {
      await ctx.db.insert("documents", doc);
    }
  },
});

Real Example from Source Code

From example/convex/example.ts:

export const consumeTokens = mutation({
  args: {
    count: v.optional(v.number()),
  },
  handler: async (ctx, args) => {
    const user = await ctx.auth.getUserIdentity();
    const key = user?.subject ?? "anonymous";
    
    return rateLimiter.limit(ctx, "demoLimit", {
      count: args.count || 1,
      key,
    });
  },
});

When to Use Custom Counts

Variable Cost Operations

When different requests have different “costs”:

LLM API calls (token usage)
Image generation (resolution/quality)
Database queries (complexity)

Resource Consumption

When limiting based on resource usage:

File upload bandwidth
Storage space
API credits

Batch Operations

When processing multiple items:

Bulk inserts
Batch exports
Multiple file uploads

Tiered Usage

When requests have different weights:

Premium vs free features
Expensive vs cheap operations
Priority queues

Combining with Per-User Limits

Custom counts work perfectly with per-user rate limiting:

const rateLimiter = new RateLimiter(components.rateLimiter, {
  // Each user gets 40,000 tokens per minute
  llmTokens: { kind: "token bucket", rate: 40000, period: MINUTE },
});

export const chat = action({
  args: { 
    userId: v.string(),
    message: v.string(),
  },
  handler: async (ctx, args) => {
    const tokenCount = estimateTokens(args.message);
    
    // Per-user token limit
    await rateLimiter.limit(ctx, "llmTokens", {
      key: args.userId,
      count: tokenCount,
      throws: true,
    });
    
    // Make API call
  },
});

Fractional Counts

Counts can be fractional (floating-point numbers):

// Consume 0.5 tokens for lightweight operations
await rateLimiter.limit(ctx, "apiRequest", { count: 0.5 });

// Consume 2.5 tokens for medium operations
await rateLimiter.limit(ctx, "apiRequest", { count: 2.5 });

Best Practices

Estimate conservatively

When estimating costs (like LLM tokens), err on the side of overestimating to avoid hitting external API limits:

// Add 20% buffer for safety
const estimatedTokens = Math.ceil(prompt.length / 4 * 1.2);

Use appropriate rate limits

Match your rate limits to the metric you’re counting:

// For tokens: higher rate, longer period
llmTokens: { rate: 40000, period: MINUTE }

// For bytes: very high rate
uploadBandwidth: { rate: 100_000_000, period: HOUR }

// For count of items: moderate rate
batchOperations: { rate: 1000, period: MINUTE }

Validate count parameter

If the count comes from user input, validate it:

if (args.count < 1 || args.count > 10000) {
  throw new Error("Invalid count");
}

Consider checking before consuming

For expensive operations, check availability first:

// Check if we have enough tokens
const check = await rateLimiter.check(ctx, "llmTokens", { 
  count: estimatedTokens 
});

if (!check.ok) {
  return { error: "Insufficient quota", retryAfter: check.retryAfter };
}

// Now consume and proceed
await rateLimiter.limit(ctx, "llmTokens", { 
  count: estimatedTokens,
  throws: true,
});

Next Steps

Learn about Checking Limits without consuming tokens
Understand Error Handling for custom counts
Explore Sharding for high-throughput limits

Get Started

Core Concepts

Usage Guide

Advanced

Examples

Overview

Using the count Parameter

Example: LLM Token Consumption

Example: File Size Limits

Example: Batch Operations

Real Example from Source Code

When to Use Custom Counts

Variable Cost Operations

Resource Consumption

Batch Operations

Tiered Usage

Combining with Per-User Limits

Fractional Counts

Best Practices

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Usage Guide

Advanced

Examples

Documentation Index

​Overview

​Using the count Parameter

​Example: LLM Token Consumption

​Example: File Size Limits

​Example: Batch Operations

​Real Example from Source Code

​When to Use Custom Counts

Variable Cost Operations

Resource Consumption

Batch Operations

Tiered Usage

​Combining with Per-User Limits

​Fractional Counts

​Best Practices

​Next Steps

Build docs developers (and LLMs) love

Overview

Using the count Parameter

Example: LLM Token Consumption

Example: File Size Limits

Example: Batch Operations

Real Example from Source Code

When to Use Custom Counts

Combining with Per-User Limits

Fractional Counts

Best Practices

Next Steps