Overview
The Gima AI Chatbot implements in-memory rate limiting using a sliding window algorithm to prevent API abuse and ensure fair resource distribution across all users.Rate Limit Configuration
Time window in milliseconds. Default: 60,000 ms (1 minute)
Maximum requests allowed per window. Default: 20 requests per minute
How It Works
Sliding Window Algorithm
The rate limiter uses a sliding window approach:- Each request from a client IP is timestamped
- Before processing a new request, old timestamps outside the current window are removed
- If the count of valid timestamps exceeds the limit, the request is rejected
- Otherwise, the current timestamp is added and the request proceeds
Example Timeline
Rate Limit Responses
When Limit is Exceeded
Clients receive a 429 Too Many Requests response:Response Headers
Number of seconds to wait before retrying. Based on when the oldest request in the current window will expire.
Number of requests remaining in the current window. Set to
"0" when limit is exceeded.Implementation Details
Identifier
Rate limiting is applied per client IP address, extracted from:X-Forwarded-Forheader (first IP)X-Real-IPheader- Socket remote address
Memory Management
The rate limiter includes automatic cleanup:- Cleanup Interval: Every 60 seconds
- Cleanup Action: Removes expired timestamps and empty records
- Memory Efficiency: Prevents memory leaks in long-running processes
Global Instance
A single global rate limiter instance (chatRateLimiter) is shared across all chat API requests:
Client Best Practices
1. Respect Retry-After Header
Always check and honor theRetry-After header when receiving a 429 response:
2. Implement Exponential Backoff
For repeated failures, use exponential backoff:3. Request Batching
Reduce request frequency by batching multiple messages into a single request:4. Monitor Remaining Quota
Implement client-side tracking to avoid hitting limits:Rate Limit Methods
checkLimit(identifier: string)
Checks if an identifier has exceeded the rate limit. Parameters:identifier(string): Usually a client IP address
true: Request is within limitsfalse: Rate limit exceeded
getRemaining(identifier: string)
Returns the number of requests remaining in the current window. Parameters:identifier(string): Usually a client IP address
number: Requests remaining (0 or more)
getRetryAfter(identifier: string)
Calculates seconds until the next request is allowed. Parameters:identifier(string): Usually a client IP address
number: Milliseconds until reset (0 if no limit)
Testing Rate Limits
Local Testing
To test rate limiting locally:Custom Rate Limits
For testing or special use cases, you can create a custom rate limiter:Production Considerations
Distributed Systems
The current implementation uses in-memory storage, which has limitations:- ⚠️ Not shared across multiple servers: Each instance maintains separate counters
- ⚠️ Lost on restart: Rate limit data is not persisted
- Redis-based rate limiting: Shared state across all servers
- API Gateway rate limiting: Cloudflare, AWS API Gateway, or Kong
- Edge middleware: Vercel Edge Functions or Cloudflare Workers
Monitoring
Track rate limit metrics:- Number of 429 responses per hour
- Most frequently rate-limited IPs
- Average retry-after duration
- Rate limit hit rate percentage
Related Resources
- POST /api/chat - Main chat endpoint documentation
- Configuration - Configure rate limiting and timeouts
- Testing - Test rate limiting behavior