Skip to main content
Rate limiting controls how many requests a client can make to your API within a given time window. Without it, a single misbehaving client can exhaust your server resources, degrade service for everyone else, and drive up infrastructure costs. By enforcing a per-user cap, you protect against abuse, ensure fair use across all clients, and keep operating costs predictable.

How RLaaS enforces rate limits

RLaaS intercepts every incoming HTTP request using a servlet filter — specifically, a Spring OncePerRequestFilter — that runs before any controller logic executes. The filter reads the userId query parameter, checks the current request count against the configured limit in Redis, and either allows the request through or returns a 429 Too Many Requests response immediately.
Filter decision flow
Incoming request


RateLimiterFilter (OncePerRequestFilter)

      ├─ Extract userId from query param

      ├─ Call rateLimitingAlgorithm.allowRequest(userId)
      │       │
      │       ├─ allowed=true  ──► filterChain.doFilter() ──► Controller
      │       │
      │       └─ allowed=false ──► HTTP 429 "Try after X seconds"

      └─ Response returned to client
Because the check happens in the filter layer, rate-limited requests never reach your business logic. This keeps your controllers clean and ensures consistent enforcement regardless of which endpoint is called.

Per-user isolation

Each userId gets its own independent Redis key. For the fixed window algorithm, keys follow the pattern rlaas:rate_limit:{userId}:{window}; for the sliding window, a single ZSET key rlaas:rate_limit:{userId} is used per user. This means Alice’s request count has no effect on Bob’s. One user exhausting their quota does not slow down or block other users. The isolation is enforced at the Redis key level — there is no shared counter.

What happens when a limit is exceeded

When a user exceeds their configured limit, RLaaS returns:
  • HTTP status: 429 Too Many Requests
  • Response body: User is not allowed...Try after X seconds.
The value X is the TTL of the Redis key for that user’s current window — the number of seconds until their quota resets. Clients can read this value to implement retry logic with an appropriate backoff.

Choosing a window size and limit

You configure rate limiting behavior with two properties:
rate-limiter:
  window-size: 60       # seconds
  max-requests: 100
A few starting points:
  • Public API, moderate traffic: 100 requests per 60 seconds
  • Stricter abuse prevention: 20 requests per 60 seconds, or 100 requests per 300 seconds
  • Internal service-to-service: 1000 requests per 60 seconds
Reduce max-requests to tighten limits without changing the window. Reduce window-size to make limits reset more frequently. For the smoothest enforcement with no boundary bursts, use the sliding window algorithm.

Fixed Window vs Sliding Window

Compare the two rate-limiting algorithms and their trade-offs.

Check endpoint

See the API reference for the rate limit check endpoint.

Build docs developers (and LLMs) love