Overview
Iris implements per-IP rate limiting using the governor crate to prevent abuse, ensure fair resource allocation, and protect against denial-of-service attacks. The rate limiter operates entirely in memory with no persistent storage.
Rate Limit Configuration
Default Limits
The API enforces the following limits per IP address:
- Sustained rate: 5 requests per second
- Burst capacity: Up to 10 requests
- Rejection response: HTTP 429 Too Many Requests
// From main.rs:127-130
// 5 requests/second per IP, burst up to 10
let quota = Quota::per_second(NonZeroU32::new(5).unwrap())
.allow_burst(NonZeroU32::new(10).unwrap());
let limiter: SharedRateLimiter = Arc::new(RateLimiter::keyed(quota));
The burst capacity allows for occasional spikes in traffic while maintaining an average rate of 5 req/s over time.
How It Works
Token Bucket Algorithm
Iris uses the token bucket algorithm implemented by the governor crate:
- Each IP address starts with 10 tokens (burst capacity)
- Tokens refill at a rate of 5 per second
- Each request consumes 1 token
- When tokens are exhausted, requests are rejected with HTTP 429
Middleware Implementation
Rate limiting is enforced via Axum middleware that runs before request handlers:
// From main.rs:35-45
async fn rate_limit_middleware(
State(state): State<AppState>,
ConnectInfo(addr): ConnectInfo<SocketAddr>,
request: Request,
next: Next,
) -> Response {
if state.limiter.check_key(&addr.ip()).is_err() {
return StatusCode::TOO_MANY_REQUESTS.into_response();
}
next.run(request).await
}
Key characteristics:
- Runs before any request processing
- Extracts client IP from socket connection
- Checks rate limit atomically
- Returns 429 immediately if limit exceeded
- Passes request to handler if within limits
Application Integration
The rate limiter is applied globally to all routes:
// From main.rs:143-149
let app = Router::new()
.route("/compare", post(handle_compare))
.route("/stats", get(handle_stats))
.route("/health", get(|| async { "OK" }))
.layer(middleware::from_fn_with_state(state.clone(), rate_limit_middleware))
.layer(cors)
.with_state(state);
All endpoints including /compare, /stats, and /health are subject to rate limiting.
Rate Limit Behavior
Successful Request Flow
# First request (burst available)
curl -X POST http://localhost:8080/compare \
-H "Content-Type: application/json" \
-d '{...}'
# Response: 200 OK
# Immediate second request (burst available)
curl -X POST http://localhost:8080/compare \
-H "Content-Type: application/json" \
-d '{...}'
# Response: 200 OK
Rate Limit Exceeded
# After exhausting burst capacity (10+ rapid requests)
curl -X POST http://localhost:8080/compare \
-H "Content-Type: application/json" \
-d '{...}'
# Response: 429 Too Many Requests
When you receive a 429 response, the API provides no response body or retry-after header. Clients should implement exponential backoff.
Rate Limit State Management
The rate limiter is stored in shared application state:
// From main.rs:26-33
type SharedRateLimiter = Arc<RateLimiter<IpAddr, DefaultKeyedStateStore<IpAddr>, DefaultClock>>;
#[derive(Clone)]
struct AppState {
engine: Arc<Mutex<FaceEngine>>,
limiter: SharedRateLimiter, // Shared across all request handlers
stats: RequestStats,
}
Thread safety:
Arc provides shared ownership across async tasks
governor uses lock-free algorithms internally
- Multiple requests can check limits concurrently
- State is keyed by
IpAddr for per-IP tracking
The client IP is extracted from the TCP connection:
ConnectInfo(addr): ConnectInfo<SocketAddr>
// addr.ip() returns the IpAddr (IPv4 or IPv6)
Proxy Considerations: If Iris is deployed behind a reverse proxy (nginx, Cloudflare, etc.), the rate limiter will see the proxy’s IP address, not the client’s IP. See Proxy Configuration below.
Configuring Custom Rate Limits
To modify the rate limits, edit the quota configuration in main.rs:
Change Sustained Rate
// Allow 10 requests per second instead of 5
let quota = Quota::per_second(NonZeroU32::new(10).unwrap())
.allow_burst(NonZeroU32::new(20).unwrap());
Change Time Window
// Allow 60 requests per minute (1 per second sustained)
let quota = Quota::per_minute(NonZeroU32::new(60).unwrap())
.allow_burst(NonZeroU32::new(10).unwrap());
Remove Burst Capacity
// Strict 5 req/s with no burst
let quota = Quota::per_second(NonZeroU32::new(5).unwrap());
// No allow_burst() call = burst size equals sustained rate
Per-Route Rate Limits
To apply different limits to different endpoints, create multiple middleware instances:
// Create separate limiters
let strict_limiter = Arc::new(RateLimiter::keyed(
Quota::per_second(NonZeroU32::new(1).unwrap())
));
let lenient_limiter = Arc::new(RateLimiter::keyed(
Quota::per_second(NonZeroU32::new(100).unwrap())
));
// Apply to specific routes
let app = Router::new()
.route("/compare", post(handle_compare))
.layer(middleware::from_fn_with_state(strict_state, rate_limit_middleware))
.route("/health", get(health))
.layer(middleware::from_fn_with_state(lenient_state, rate_limit_middleware));
Proxy Configuration
When deployed behind a reverse proxy, configure the proxy to forward the real client IP:
Nginx
location / {
proxy_pass http://localhost:8080;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
Then modify Iris to read from the X-Forwarded-For header instead of ConnectInfo:
use axum::http::HeaderMap;
async fn rate_limit_middleware(
State(state): State<AppState>,
headers: HeaderMap,
request: Request,
next: Next,
) -> Response {
let ip = headers
.get("X-Forwarded-For")
.and_then(|h| h.to_str().ok())
.and_then(|s| s.split(',').next())
.and_then(|s| s.trim().parse::<IpAddr>().ok())
.unwrap_or_else(|| "127.0.0.1".parse().unwrap());
if state.limiter.check_key(&ip).is_err() {
return StatusCode::TOO_MANY_REQUESTS.into_response();
}
next.run(request).await
}
Security Risk: Only read X-Forwarded-For if you trust the proxy. Malicious clients can spoof this header to bypass rate limiting.
Cloudflare Integration
If using Cloudflare, use the CF-Connecting-IP header:
let ip = headers
.get("CF-Connecting-IP")
.and_then(|h| h.to_str().ok())
.and_then(|s| s.parse::<IpAddr>().ok())
.unwrap_or_else(|| "127.0.0.1".parse().unwrap());
Monitoring Rate Limits
Check Current Rate Limit Status
The rate limiter state is in-memory only. To observe rate limiting:
- Monitor 429 responses in your reverse proxy logs
- Track response status codes in your application metrics
- Implement custom middleware to expose rate limit headers
To help clients understand their limits, add response headers:
async fn rate_limit_middleware(
State(state): State<AppState>,
ConnectInfo(addr): ConnectInfo<SocketAddr>,
request: Request,
next: Next,
) -> Response {
match state.limiter.check_key(&addr.ip()) {
Ok(_) => {
let mut response = next.run(request).await;
response.headers_mut().insert(
"X-RateLimit-Limit",
"5".parse().unwrap()
);
response
}
Err(_) => {
let mut response = StatusCode::TOO_MANY_REQUESTS.into_response();
response.headers_mut().insert(
"Retry-After",
"1".parse().unwrap()
);
response
}
}
}
- Memory usage: O(n) where n = number of unique IPs in the time window
- Lookup speed: O(1) constant time per request
- Lock contention: Minimal (lock-free algorithm)
- Overhead: ~10-50 microseconds per request
The rate limiter automatically prunes expired entries, so memory usage remains bounded even with millions of IPs over time.
Testing Rate Limits
Load Testing Script
#!/bin/bash
# test-rate-limit.sh
for i in {1..15}; do
echo "Request $i"
curl -w "Status: %{http_code}\n" \
-X POST http://localhost:8080/compare \
-H "Content-Type: application/json" \
-d '{
"target_url": "https://example.com/face.jpg",
"people": []
}' &
done
wait
Expected output:
- First 10 requests: HTTP 200 (burst capacity)
- Remaining 5 requests: HTTP 429 (rate limited)
Apache Bench
ab -n 100 -c 10 -p request.json -T application/json \
http://localhost:8080/compare
Monitor the number of Non-2xx responses to see rate limiting in action.
Comparison with Alternative Approaches
| Approach | Pros | Cons |
|---|
| IP-based (current) | Simple, no client setup, works immediately | Can’t distinguish users behind NAT |
| API key-based | Per-user limits, better for commercial use | Requires authentication layer |
| Token bucket (current) | Allows bursts, fair over time | Complex to explain to users |
| Fixed window | Easy to understand | Vulnerable to burst attacks at window boundaries |
| Sliding window | More accurate rate enforcement | Higher memory usage |
Frequently Asked Questions
Q: Why am I getting 429 errors when I’m only making a few requests?
A: If behind a proxy/NAT, multiple users may share the same IP address. Consider implementing API key-based rate limiting.
Q: Can I disable rate limiting for testing?
A: Yes, set an extremely high quota:
let quota = Quota::per_second(NonZeroU32::new(1_000_000).unwrap());
Q: Does rate limiting persist across server restarts?
A: No. Rate limit state is in-memory only and resets when the server restarts.
Q: How do I implement whitelisted IPs?
A: Add a check in the middleware:
let whitelisted = ["192.168.1.100".parse().unwrap()];
if whitelisted.contains(&addr.ip()) {
return next.run(request).await;
}
Q: What happens to the rate limit when I scale horizontally?
A: Each instance has independent rate limit state. Consider using Redis-based rate limiting for distributed deployments.