Blog API rate limiting: per-endpoint limits and 429s

The Blog API enforces rate limits on every endpoint to prevent excessive load on the server and the upstream data source. Rate limiting is handled by slowapi, a FastAPI-compatible wrapper around the limits library. Each limit is applied per client IP address, so different callers do not share a quota.

Limits per endpoint

Endpoint	Method	Limit
`/blogs`	GET	5 requests / minute
`/blogs/latest`	GET	5 requests / minute
`/blogs/search`	GET	5 requests / minute
`/blogs/cache`	POST	1 request / minute

The read endpoints share the same generous limit because they only serve data from the in-memory cache. The cache refresh endpoint is restricted more tightly because it triggers a live HTTP request to the upstream source.

How the limit key works

The API uses get_remote_address as the key function, which means the rate limit window is tracked per client IP address. Each unique IP gets its own independent counter for each endpoint. If you are behind a proxy or NAT, all requests from that shared IP will count against the same quota.

What happens when you exceed a limit

When you exceed the allowed number of requests, the API returns an HTTP 429 Too Many Requests response. This is handled automatically by slowapi’s built-in _rate_limit_exceeded_handler. The response body contains a plain-text or JSON message indicating the limit was exceeded.

429 response

{
  "error": "Rate limit exceeded: 5 per 1 minute"
}

Your client should treat a 429 as a signal to back off, not retry immediately.

Handling 429 responses in your client

The recommended approach is exponential backoff with jitter: wait a short interval after the first failure, double the wait on each subsequent failure, and add a small random offset to avoid synchronized retries from multiple clients.

exponential backoff example

import time
import random
import requests

def get_blogs_with_retry(base_url: str, max_retries: int = 4) -> list:
    url = f"{base_url}/blogs"
    wait = 2  # seconds

    for attempt in range(max_retries):
        response = requests.get(url)

        if response.status_code == 200:
            return response.json()

        if response.status_code == 429:
            jitter = random.uniform(0, 1)
            sleep_time = wait + jitter
            print(f"Rate limited. Retrying in {sleep_time:.1f}s (attempt {attempt + 1})")
            time.sleep(sleep_time)
            wait *= 2  # double the wait each time
        else:
            response.raise_for_status()

    raise RuntimeError(f"Failed after {max_retries} retries due to rate limiting")

This pattern applies equally to any of the rate-limited endpoints. For POST /blogs/cache, where the limit is 1 request per minute, set your initial wait to at least 60 seconds.

If you are polling the API periodically — for example, to check for new posts — space your calls out to well within the per-minute limit rather than sending requests in rapid bursts. Calling GET /blogs once every 15 seconds comfortably stays within the 5-per-minute ceiling.

Get Started

Guides

Blog API rate limiting: per-endpoint limits and 429s

Limits per endpoint

How the limit key works

What happens when you exceed a limit

Handling 429 responses in your client

Build docs developers (and LLMs) love

Get Started

Guides

Documentation Index

​Limits per endpoint

​How the limit key works

​What happens when you exceed a limit

​Handling 429 responses in your client

Build docs developers (and LLMs) love

Limits per endpoint

How the limit key works

What happens when you exceed a limit

Handling 429 responses in your client