Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Project516/BlogAPI/llms.txt

Use this file to discover all available pages before exploring further.

The Blog API uses a two-layer caching strategy to serve blog post data quickly without hitting the upstream source on every request. When the server starts, it attempts to load previously cached data from /tmp/cache.json into memory. If that file does not exist, the in-memory cache starts empty. The only way to populate or refresh the cache — whether on a fresh start or after new posts are published — is to call POST /blogs/cache.

How the cache is populated

The cache is built from two sources that work in sequence. On startup, the server reads /tmp/cache.json if it exists and loads its contents into the in-memory cache list:
startup
try:
    with open("/tmp/cache.json", "r") as file:
        cache = json.load(file)
except FileNotFoundError:
    cache = []
On demand, calling POST /blogs/cache triggers a live scrape, overwrites the in-memory cache, and writes the result back to /tmp/cache.json:
on-demand refresh
cache = scrape_blogs(
    "https://raw.githubusercontent.com/Project516/project516.github.io/refs/heads/master/blog.html"
)
with open("/tmp/cache.json", "w") as file:
    json.dump(cache, file)

How the scraper works

When POST /blogs/cache is called, the scraper fetches the raw HTML of the upstream blog index from GitHub and parses it with BeautifulSoup. It locates every <article> element in the document, then extracts three pieces of data from each one:
  • The text content of the <h2> tag as the post title
  • The href attribute of the first <a> tag, prefixed with https://project516.dev/
  • The datetime attribute of the <time> tag as the post date
Any article that does not contain an <a> tag is skipped.

Blog post data shape

Each cached blog post is a JSON object with three fields:
blog post object
{
  "title": "My blog post title",
  "link": "https://project516.dev/posts/my-blog-post",
  "date": "2024-11-01"
}
title
string
required
The text content of the post’s <h2> element, with surrounding whitespace stripped.
The absolute URL to the blog post, constructed by prepending https://project516.dev/ to the href found in the article’s <a> tag.
date
string
required
The datetime attribute value from the post’s <time> element, typically in YYYY-MM-DD format.

When to refresh the cache

You must call POST /blogs/cache to pick up any new blog posts published to the upstream source. The read endpoints (GET /blogs, GET /blogs/latest, GET /blogs/search) all read directly from the in-memory cache and never trigger a scrape themselves.
refresh the cache
curl -X POST http://localhost:8000/blogs/cache
A successful refresh returns:
success response
{
  "message": "Blogs cached successfully"
}
If the server restarts and /tmp/cache.json is not present — for example, after a system reboot that clears /tmp — the in-memory cache will be empty and all read endpoints will return no data until you call POST /blogs/cache again.
POST /blogs/cache is rate-limited to 1 request per minute per IP address. If you need to trigger multiple refreshes in quick succession during testing, wait at least 60 seconds between calls.
If the upstream GitHub URL is unreachable when you call POST /blogs/cache, the scraper raises an exception and the server returns an HTTP 500 error. The in-memory cache is not modified — the exception is thrown before the cache variable is reassigned, so existing cached data is preserved. Verify connectivity to raw.githubusercontent.com if you receive a 500 response.

Build docs developers (and LLMs) love