BCycle Map Architecture: Cloudflare Workers & R2

BCycle Map is built entirely on Cloudflare primitives: three Workers handle polling, HTTP serving, and daily smoke testing; Cloudflare KV gives the live map sub-second reads of the latest station snapshot; Cloudflare R2 stores a growing parquet archive for historical analysis; GitHub Actions runs the compaction job (every 3 hours) that seals KV buffers into R2 parquet; and a React + Vite frontend renders everything in the browser. No dedicated servers, no managed databases, no egress fees.

System Overview

┌─────────────────────────────────────┐
│   Cloudflare Pages                  │
│   React 18 + Vite + TypeScript      │
│                                     │
│   /         Live Map (MapLibre)     │ ──► Read API Worker ──► KV (latest snapshot)
│   /flow      Flow Map (Deck.gl)     │ ──► Read API Worker ──► R2 (activity log)
│   /explore   Explore (DuckDB-WASM)  │ ──► R2 parquet (direct HTTP fetch)
└─────────────────────────────────────┘
                                              ▲                ▲
                                              │                │
┌─────────────────────────────────────┐       │                │
│   bcycle-map-poller (cron Worker)   │       │                │
│   Fires every 5 minutes             │       │                │
│                                     │       │                │
│   1. Fetch gbfs.json (discovery)    │       │                │
│   2. Fetch station_information,     │       │                │
│      station_status, system_info    │       │                │
│   3. normalize() → internal shape   │ ─write─► KV            │
│   4. Append to intra-hour buffer    │ ─write──────────────► R2 (activity)
└─────────────────────────────────────┘
                                                               ▲
┌─────────────────────────────────────┐                        │
│   GitHub Actions (.github/          │                        │
│   workflows/compact.yml)            │                        │
│   Runs every 3 hours                │                        │
│                                     │                        │
│   Read KV buffer → parquet-wasm +   │ ─write──────────────► R2 parquet
│   apache-arrow → seal parquet       │   gbfs/<id>/station_status/
│   Delete sealed KV buffer key       │   dt=YYYY-MM-DD/<HH>.parquet
└─────────────────────────────────────┘

Hot Path vs. Cold Path

The architecture is deliberately split into two asymmetric read paths based on what each view needs from the data.

Hot Path — Live Map

The live map needs the current state of every station, as fresh as possible, served in milliseconds. KV is the right store for exactly this pattern: one key, one value, one fast read.

Browser
  → GET /api/systems/bcycle_santabarbara/current
  → bcycle-map-read-api Worker
  → env.GBFS_KV.get("system:bcycle_santabarbara:latest")
  → JSON (~50 KB for 85 stations) back to browser
  → MapLibre re-renders station markers
  → Browser polls again in 60 seconds

The read-api Worker sets Cache-Control: max-age=60 so the Cloudflare edge caches responses between frontend polling cycles, keeping per-request KV reads to a minimum.

Cold Path — Explore View

The Explore view needs weeks or months of data for trend analysis. Sending that through a Worker API would be slow and expensive. Instead, the browser fetches parquet files from R2 directly and runs SQL queries locally using DuckDB-WASM — no server-side database at all.

Browser loads /explore
  → DuckDB-WASM boots in a Web Worker
  → User selects a date range
  → DuckDB executes:
       SELECT … FROM 'https://<r2-bucket>.r2.dev/gbfs/bcycle_santabarbara/
                       station_status/dt=2026-05-*/*.parquet'
  → R2 streams parquet bytes (range requests, column pruning)
  → DuckDB executes in-browser, hands rows to the visualization layer
  → Deck.gl renders the result

Because R2 has no egress fees and GBFS data is publicly redistributable, the R2 bucket is configured for public read access. DuckDB-WASM’s HTTP range requests make columnar parquet fetches highly efficient — only the columns you query are transferred.

Typical Profiles — Station Details View

The Station Details view shows a per-station typical availability chart (bikes by hour of day, optionally split by day of week). These profiles are too expensive to compute on-the-fly from raw parquet on every request, so the compute-popularity GitHub Action pre-computes them and writes one JSON file per station to R2. The read-api Worker serves them through a dedicated endpoint:

Browser opens Station Details for a station
  → GET /api/systems/bcycle_santabarbara/stations/<stationId>/recent
  → bcycle-map-read-api Worker
  → env.GBFS_R2.get("gbfs/bcycle_santabarbara/typicals/<stationId>.json")
  → Returns { stationId, hours[], currentHour, currentDow, daysCovered, isDowFiltered, label, timezone }
  → Station Details view renders the hourly availability chart

If the typical profile file doesn’t yet exist (e.g. for a freshly added system), the endpoint returns a well-formed 24-hour shape filled with zeros so the frontend can always render the chart skeleton. The response is cached for 5 minutes (Cache-Control: max-age=300).

The day-of-week filter activates only once a system has at least 21 days of history (daysCovered >= 21). Below that threshold all-days averages are shown because per-day-of-week samples are too sparse to be meaningful.

Three Workers

Each Worker has a dedicated wrangler*.toml config, its own name, and its own trigger type. They share the same KV namespace binding and R2 bucket binding so they can read and write each other’s data.

Worker	Config file	Trigger	Job
`bcycle-map-poller`	`wrangler.toml`	Cron every 5 minutes (`/5 * * *`)	Fetch GBFS feeds → `normalize()` → write KV latest + KV buffer
`bcycle-map-read-api`	`wrangler.read-api.toml`	HTTP (fetch handler)	Serve KV snapshots, R2 activity logs, parquet partition lists, trip inference, analytics
`bcycle-map-smoke`	`wrangler.smoke.toml`	Daily cron at 09:00 UTC (`0 9 * * *`)	Fetch the real GBFS feed, run `normalize()`, file a GitHub Issue if the shape check fails

The smoke Worker is a canary for upstream schema changes. If BCycle silently changes their GBFS payload shape, normalize() will throw, the smoke Worker catches it, and a labeled GitHub Issue is filed automatically — before the prod poller silently drops cycles.

Storage

Cloudflare KV

KV is the hot-path store. Two key patterns are maintained per system:

Key pattern	Contents	Written by	Read by
`system:<id>:latest`	Full `KVValue` JSON: `system`, `snapshot_ts`, all station snapshots, `max_bikes_ever`, `recent24h` sparkline data, `last_total_changed_ts`	Poller (every 5 min)	Read-api Worker (live map)
`system:<id>:buffer:<YYYY-MM-DD-HH>`	Array of `BufferEntry` objects — lightweight per-tick records (station IDs + availability counts only) accumulated throughout the hour	Poller (every 5 min, append)	GitHub Actions compaction (read → seal → delete)

The buffer key is deleted after the compaction job seals it into R2 parquet. If compaction is missed — for example due to a failed GitHub Actions run — the next compaction job finds the orphaned buffer key and self-heals by sealing it retroactively.

The Workers free tier allows 1 000 KV puts per day. A single active system at the 5-minute poll interval consumes 576 puts/day (288 ticks × 2 puts: one for :latest, one for :buffer). This leaves roughly 420 puts/day for activity writes and manual operations. Running two active systems simultaneously would exceed the cap — use the enabled: false flag in systems.json to pause a system without losing its history.

Cloudflare R2

R2 is the cold-path store. Objects are organized into two categories: Parquet partitions (written by GitHub Actions compaction):

gbfs/<system_id>/station_status/dt=YYYY-MM-DD/<HH>.parquet

For example: gbfs/bcycle_santabarbara/station_status/dt=2026-05-13/14.parquet holds all snapshots from the 14:00 UTC hour on 2026-05-13. Each parquet file contains flattened station rows: snapshot_ts, station_id, num_bikes_available, num_docks_available, bikes_electric, bikes_classic, bikes_smart, is_installed, is_renting, is_returning, last_reported. Operational objects (written by the poller and compute scripts):

R2 key	Contents	Written by
`gbfs/<id>/activity.json`	`ActivityLog` — departure/arrival events and inferred trips, capped to the 50 most recent entries	Poller
`gbfs/<id>/travel-times.json`	Station-to-station travel-time matrix used for greedy trip inference	`compute-routes` npm script (run via GitHub Action)
`gbfs/<id>/typicals/<station_id>.json`	Pre-computed typical availability profiles (by hour, optionally by day-of-week)	`compute-popularity` npm script (run via GitHub Action)
`gbfs/systems-index.json`	System metadata list served by `GET /api/systems`	`corridors` GitHub Action

Compaction

The intra-hour KV buffer is compact by design: each BufferEntry stores only the dynamic fields (availability counts and flags), not the full station metadata. At the top of each hour, the GitHub Actions compaction workflow:

Lists all system:<id>:buffer:<YYYY-MM-DD-HH> keys older than the current hour
Reads each buffer from KV via the Cloudflare KV REST API
Joins the dynamic buffer entries against the latest station metadata to reconstruct full rows
Encodes rows as columnar parquet using parquet-wasm + apache-arrow in Node
Writes the sealed parquet file to R2 at the hive-partitioned path
Deletes the KV buffer key

Compaction intentionally runs in GitHub Actions, not in a Worker. The parquet-wasm + apache-arrow dependency bundle exceeds the Cloudflare Workers 1 MiB script size limit. Running compaction in GitHub Actions sidesteps this constraint without requiring a Workers Paid plan, and GitHub Actions free-tier minutes are more than sufficient for the 3-hour cadence (5 */3 * * * cron).

The `normalize()` Anti-Corruption Layer

All GBFS version-specific parsing lives inside normalize() in src/shared/normalize.ts. The three entry points — normalizeStationInformation(), normalizeStationStatus(), and normalizeSystemInformation() — accept raw GBFS JSON and return the project’s internal typed shapes (StationStatic[], StationDynamic[], SystemInfo). Everything downstream — the KV writer, the parquet encoder, the frontend, the tests — works exclusively on these internal types. This means:

Adding GBFS v2.x support later means adding normalizers inside normalize.ts. Nothing else changes.
The smoke Worker catches upstream schema changes by running normalize() against the live feed daily.
Test fixtures are captured real GBFS responses. The unit tests exercise normalize() directly against those fixtures, giving high confidence that the anti-corruption layer holds.

// src/shared/types.ts (simplified)
export type StationDynamic = {
  station_id: string
  num_bikes_available: number
  num_docks_available: number
  bikes_electric: number
  bikes_classic: number
  bikes_smart: number
  is_installed: boolean
  is_renting: boolean
  is_returning: boolean
  last_reported: number
}

Poll Cadence and Free-Tier Budget

The poller cron expression is */5 * * * * — every 5 minutes. This cadence is intentional:

5 min interval
  × 288 ticks per day
  × 2 mandatory KV puts per tick (system:<id>:latest + system:<id>:buffer:<YYYY-MM-DD-HH>)
= 576 KV puts/day

Workers free tier: 1 000 KV puts/day
Remaining headroom: ~424 puts/day

The 424-put headroom accommodates activity log writes to R2 (which don’t count against the KV quota), manual wrangler tail sessions, and the occasional manual workflow trigger without ever threatening the daily cap for a single active system.

The earlier design doc specified a 2-minute poll interval (720 ticks/day). The cron was widened to 5 minutes before launch to maximize free-tier headroom and keep the historical resolution practical for the trip-inference and analytics use cases BCycle Map actually serves.

API Reference

Every endpoint the read-api Worker exposes: /current, /activity, /trips, /snapshots, /partitions, and more.

Managing Systems

How to add, enable, or disable a GBFS system in systems.json and what downstream jobs need to re-run.

Compaction Pipeline

Deep-dive into the GitHub Actions workflow that seals KV buffers into hive-partitioned R2 parquet.

Quickstart

Clone, test, and deploy BCycle Map from scratch in a single guided walkthrough.

Get Started

Map Views

Configuration

Deployment

BCycle Map Architecture: Cloudflare Workers & R2

System Overview

Hot Path vs. Cold Path

Hot Path — Live Map

Cold Path — Explore View

Typical Profiles — Station Details View

Three Workers

Storage

Cloudflare KV

Cloudflare R2

Compaction

The `normalize()` Anti-Corruption Layer

Poll Cadence and Free-Tier Budget

API Reference

Managing Systems

Compaction Pipeline

Quickstart

Build docs developers (and LLMs) love

Get Started

Map Views

Configuration

Deployment

Documentation Index

​System Overview

​Hot Path vs. Cold Path

​Hot Path — Live Map

​Cold Path — Explore View

​Typical Profiles — Station Details View

​Three Workers

​Storage

​Cloudflare KV

​Cloudflare R2

​Compaction

​The normalize() Anti-Corruption Layer

​Poll Cadence and Free-Tier Budget

​Related Documentation

API Reference

Managing Systems

Compaction Pipeline

Quickstart

Build docs developers (and LLMs) love

System Overview

Hot Path vs. Cold Path

Hot Path — Live Map

Cold Path — Explore View

Typical Profiles — Station Details View

Three Workers

Storage

Cloudflare KV

Cloudflare R2

Compaction

The `normalize()` Anti-Corruption Layer

Poll Cadence and Free-Tier Budget

Related Documentation