Rate Limiting

Overview

The Gima AI Chatbot implements in-memory rate limiting using a sliding window algorithm to prevent API abuse and ensure fair resource distribution across all users.

Rate Limit Configuration

windowMs

number

default:"60000"

Time window in milliseconds. Default: 60,000 ms (1 minute)

maxRequests

number

default:"20"

Maximum requests allowed per window. Default: 20 requests per minute

How It Works

Sliding Window Algorithm

The rate limiter uses a sliding window approach:

Each request from a client IP is timestamped
Before processing a new request, old timestamps outside the current window are removed
If the count of valid timestamps exceeds the limit, the request is rejected
Otherwise, the current timestamp is added and the request proceeds

Example Timeline

Time: 0s  - Request 1 ✅ (1/20)
Time: 5s  - Request 2 ✅ (2/20)
Time: 10s - Request 3 ✅ (3/20)
...
Time: 50s - Request 20 ✅ (20/20)
Time: 55s - Request 21 ❌ (Rate limit exceeded)
Time: 61s - Request 22 ✅ (1/20) - First request expired

Rate Limit Responses

When Limit is Exceeded

Clients receive a 429 Too Many Requests response:

{
  "error": "Too Many Requests",
  "message": "Has excedido el límite de solicitudes. Intenta nuevamente en unos segundos.",
  "retryAfter": 30
}

Response Headers

Retry-After

string

Number of seconds to wait before retrying. Based on when the oldest request in the current window will expire.

X-RateLimit-Remaining

string

Number of requests remaining in the current window. Set to "0" when limit is exceeded.

Implementation Details

Identifier

Rate limiting is applied per client IP address, extracted from:

X-Forwarded-For header (first IP)
X-Real-IP header
Socket remote address

In development mode, localhost connections are allowed.

Memory Management

The rate limiter includes automatic cleanup:

Cleanup Interval: Every 60 seconds
Cleanup Action: Removes expired timestamps and empty records
Memory Efficiency: Prevents memory leaks in long-running processes

Global Instance

A single global rate limiter instance (chatRateLimiter) is shared across all chat API requests:

export const chatRateLimiter = new RateLimiter({
  windowMs: 60 * 1000, // 1 minute
  maxRequests: 20,     // 20 requests per minute
});

Client Best Practices

1. Respect Retry-After Header

Always check and honor the Retry-After header when receiving a 429 response:

const response = await fetch('/api/chat', {
  method: 'POST',
  body: JSON.stringify(payload)
});

if (response.status === 429) {
  const data = await response.json();
  const retryAfter = data.retryAfter || 60;
  
  console.log(`Rate limited. Retry in ${retryAfter} seconds`);
  
  // Wait before retrying
  await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
}

2. Implement Exponential Backoff

For repeated failures, use exponential backoff:

let retryCount = 0;
const maxRetries = 3;

while (retryCount < maxRetries) {
  const response = await fetch('/api/chat', options);
  
  if (response.status === 429) {
    const delay = Math.pow(2, retryCount) * 1000; // 1s, 2s, 4s...
    await new Promise(resolve => setTimeout(resolve, delay));
    retryCount++;
  } else {
    break;
  }
}

3. Request Batching

Reduce request frequency by batching multiple messages into a single request:

// ❌ Multiple requests
await fetch('/api/chat', { body: JSON.stringify({ messages: [msg1] }) });
await fetch('/api/chat', { body: JSON.stringify({ messages: [msg2] }) });

// ✅ Single request with conversation history
await fetch('/api/chat', { 
  body: JSON.stringify({ 
    messages: [msg1, response1, msg2] 
  }) 
});

4. Monitor Remaining Quota

Implement client-side tracking to avoid hitting limits:

let requestsInWindow = 0;
const maxRequests = 20;
const windowMs = 60000;

function canMakeRequest() {
  if (requestsInWindow >= maxRequests) {
    console.warn('Approaching rate limit');
    return false;
  }
  return true;
}

function trackRequest() {
  requestsInWindow++;
  setTimeout(() => requestsInWindow--, windowMs);
}

Rate Limit Methods

checkLimit(identifier: string)

Checks if an identifier has exceeded the rate limit. Parameters:

identifier (string): Usually a client IP address

Returns:

true: Request is within limits
false: Rate limit exceeded

getRemaining(identifier: string)

Returns the number of requests remaining in the current window. Parameters:

identifier (string): Usually a client IP address

Returns:

number: Requests remaining (0 or more)

getRetryAfter(identifier: string)

Calculates seconds until the next request is allowed. Parameters:

identifier (string): Usually a client IP address

Returns:

number: Milliseconds until reset (0 if no limit)

Testing Rate Limits

Local Testing

To test rate limiting locally:

# Send 21 rapid requests (exceeds default 20/min limit)
for i in {1..21}; do
  curl -X POST http://localhost:3000/api/chat \
    -H "Content-Type: application/json" \
    -d '{"messages": [{"role": "user", "content": "Test '$i'"}]}' &
done
wait

The 21st request should return a 429 error.

Custom Rate Limits

For testing or special use cases, you can create a custom rate limiter:

import { RateLimiter } from '@/app/lib/rate-limiter';

const customLimiter = new RateLimiter({
  windowMs: 30 * 1000,  // 30 seconds
  maxRequests: 5,       // 5 requests
});

Production Considerations

Distributed Systems

The current implementation uses in-memory storage, which has limitations:

⚠️ Not shared across multiple servers: Each instance maintains separate counters
⚠️ Lost on restart: Rate limit data is not persisted

For production deployments with multiple instances, consider:

Redis-based rate limiting: Shared state across all servers
API Gateway rate limiting: Cloudflare, AWS API Gateway, or Kong
Edge middleware: Vercel Edge Functions or Cloudflare Workers

Monitoring

Track rate limit metrics:

Number of 429 responses per hour
Most frequently rate-limited IPs
Average retry-after duration
Rate limit hit rate percentage

POST /api/chat - Main chat endpoint documentation
Configuration - Configure rate limiting and timeouts
Testing - Test rate limiting behavior

API Routes

Server Actions

Services

AI Tools

Overview

Rate Limit Configuration

How It Works

Sliding Window Algorithm

Example Timeline

Rate Limit Responses

When Limit is Exceeded

Response Headers

Implementation Details

Identifier

Memory Management

Global Instance

Client Best Practices

1. Respect Retry-After Header

2. Implement Exponential Backoff

3. Request Batching

4. Monitor Remaining Quota

Rate Limit Methods

checkLimit(identifier: string)

getRemaining(identifier: string)

getRetryAfter(identifier: string)

Testing Rate Limits

Local Testing

Custom Rate Limits

Production Considerations

Distributed Systems

Monitoring

Build docs developers (and LLMs) love

API Routes

Server Actions

Services

AI Tools

​Overview

​Rate Limit Configuration

​How It Works

​Sliding Window Algorithm

​Example Timeline

​Rate Limit Responses

​When Limit is Exceeded

​Response Headers

​Implementation Details

​Identifier

​Memory Management

​Global Instance

​Client Best Practices

​1. Respect Retry-After Header

​2. Implement Exponential Backoff

​3. Request Batching

​4. Monitor Remaining Quota

​Rate Limit Methods

​checkLimit(identifier: string)

​getRemaining(identifier: string)

​getRetryAfter(identifier: string)

​Testing Rate Limits

​Local Testing

​Custom Rate Limits

​Production Considerations

​Distributed Systems

​Monitoring

​Related Resources

Build docs developers (and LLMs) love

Overview

Rate Limit Configuration

How It Works

Sliding Window Algorithm

Example Timeline

Rate Limit Responses

When Limit is Exceeded

Response Headers

Implementation Details

Identifier

Memory Management

Global Instance

Client Best Practices

1. Respect Retry-After Header

2. Implement Exponential Backoff

3. Request Batching

4. Monitor Remaining Quota

Rate Limit Methods

checkLimit(identifier: string)

getRemaining(identifier: string)

getRetryAfter(identifier: string)

Testing Rate Limits

Local Testing

Custom Rate Limits

Production Considerations

Distributed Systems

Monitoring

Related Resources