Understanding Pingora’s internals helps you reason about performance, tune thread counts, and integrate background services correctly. The framework is built around a clear ownership model: a singleDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/cloudflare/pingora/llms.txt
Use this file to discover all available pages before exploring further.
Server coordinates multiple independent Service instances, each of which owns its own Tokio runtime and threadpool. This design avoids cross-service thread contention and enables per-service tuning without any shared executor overhead.
Some advanced topics — particularly the proxy phase chart and cache integration callbacks — are still a work-in-progress in the official Pingora documentation. The information on this page is sourced from the upstream
docs/user_guide/internals.md and from reading the source code directly.Starting the Server
A Pingora application starts by creating and running aServer. The Server is responsible for spawning all registered Service instances and listening for termination signals (SIGTERM, SIGQUIT on Unix).
Server blocks waiting for a shutdown signal and propagates it to all running services when one arrives. This enables graceful draining without dropped connections.
The Service Model
EachService instance encapsulates:
- A set of listening endpoints (
Listeners) — TCP sockets, Unix domain sockets, etc., each optionally with TLS. - An application (
A: ServiceApp) — the logic that handles each accepted connection. - Its own Tokio runtime with a dedicated threadpool.
Threading Model
The threads configuration option
Each service has an independent thread count. The global default is set in ServerConf:
threads: N gets a Tokio runtime with N worker threads. If N is 1 the runtime is effectively a single-threaded executor.
Work-stealing vs. isolated runtimes
Thework_stealing configuration option (default: true) controls how the multi-threaded executor is structured:
-
work_stealing: true— A single multi-threaded Tokio runtime with N worker threads and work-stealing. Tasks are automatically rebalanced across idle threads. This is the standard Tokiort-multi-threadruntime and is the most efficient choice for most workloads. -
work_stealing: false— N independent single-threaded Tokio runtimes, one per thread. Tasks are pinned to their originating thread and never migrate. This eliminates work-stealing overhead and improves CPU cache locality, at the cost of potential load imbalance if some threads become hot while others idle.
Service Listeners and TransportStack
At startup, each service’s endpoints are built into TransportStack objects. Each TransportStack bundles a listening socket, an optional TLS acceptor, and upgrade file descriptors (for zero-downtime upgrades). One async task is spawned per TransportStack within the service’s executor:
Downstream Connection Lifecycle
Each accepted TCP connection is processed in its own Tokio task. The lifecycle follows these steps:UninitStream::handshake()— TLS handshake (if applicable) and protocol detection.Service::handle_event()— Route the connection to the appropriate application handler.App::process_new()— Handle the first request/event on the connection.- The task loops at step 3 while the connection is being reused (HTTP keep-alive, HTTP/2 multiplexing).
- When the connection is closed, the task ends.
What is a Proxy?
TheServer has no built-in notion of a proxy. It operates purely in terms of Service<A> where A implements ServiceApp. The pingora-proxy crate layers HTTP proxy semantics on top by providing HttpProxy<CTX> which implements HttpServerApp, which in turn implements ServerApp:
HttpProxy struct drives the high-level proxy workflow and exposes customisation points via the ProxyHttp trait. Implementing ProxyHttp lets you hook into each phase of request processing: request_filter, upstream_peer, upstream_request_filter, upstream_response_filter, response_filter, logging, and more.
Managing Upstream Connections: Connectors
Connections to upstream peers are managed by Connectors. AConnector is not a single type but a pattern: it handles establishing a connection to a Peer, maintaining a connection pool for reuse across requests, measuring connection health (H2 pings), and handling protocol-specific concerns like H2 multiplexing and compression.
Peer selection — choosing which upstream to connect to — is handled one level above, in the upstream_peer() method of the ProxyHttp trait. The LoadBalancer from pingora-load-balancing is typically used there.
Background Services
BackgroundService is an interface for long-running tasks that exist outside the request/response lifecycle — service discovery, health checks, metrics export, etc. A background service is wrapped with background_service() and added to the Server just like any other service:
BackgroundService trait provides two entry points:
start_with_ready_notifier variant is useful when downstream services must wait for the background task to finish its first discovery or health-check cycle before they start accepting traffic.
Per-Service Thread Count
Each listening service has athreads field that overrides the global default for that service alone:
set_runtime_opts_override method, which accepts a RuntimeOptsOverride callback (Arc<dyn Fn(&RuntimeOpts) -> Option<RuntimeOpts> + Send + Sync>). Returning None from the callback applies the global settings unchanged.
Zero-Downtime Upgrades
Pingora supports graceful upgrades on Unix systems using file-descriptor passing. When a new process is started alongside a running one, the old process passes its listening socket FDs to the new process. The new process begins accepting new connections while the old process continues to drain existing ones. TheUpgradeFDs mechanism in TransportStack handles this transfer transparently.
Configuration for grace period behaviour is exposed through ServerConf:
