TiDB is designed so that no single node failure interrupts service. Data is replicated across multiple nodes using the Raft consensus protocol, leaders are elected automatically when a node goes down, and the TiDB compute layer is stateless so you can add or remove servers without downtime.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/pingcap/tidb/llms.txt
Use this file to discover all available pages before exploring further.
Raft consensus protocol
TiDB stores data in TiKV, which uses Raft to maintain consistency across replicas. Raft is a consensus algorithm that guarantees:- A write is committed only after a majority of replicas acknowledge it.
- All replicas apply writes in the same order.
- If the current leader fails, the remaining replicas elect a new leader automatically without external coordination.
Region-based data distribution
TiKV splits data into contiguous key ranges called Regions. Each Region is approximately 96 MB by default. Regions are the unit of replication and scheduling.- Every Region has its own Raft group with its own leader and followers.
- Regions split automatically as data grows and merge when data shrinks.
- PD (Placement Driver) balances Regions across TiKV nodes to distribute load evenly.
The default replication factor is 3 replicas per Region. You can increase this to 5 for higher fault tolerance, at the cost of additional storage and write amplification.
Automatic leader election and failover
When a TiKV leader node becomes unavailable, Raft triggers an election among the followers for that Region. The election completes in seconds (typically under 10 seconds by default, configurable withraft-election-timeout).
During the election window, writes to the affected Regions are briefly unavailable. Once the new leader is elected, writes resume automatically. Reads can often continue from followers using Follower Read or Stale Read during this period.
PD: Placement Driver
PD is the cluster’s control plane. It:- Allocates timestamps — PD is the centralized timestamp oracle. Every transaction gets a globally monotonic start timestamp from PD, enabling consistent snapshot reads across the entire cluster.
- Schedules Regions — PD balances Regions across TiKV nodes, moves leaders away from hot nodes, and triggers splits and merges.
- Stores cluster metadata — PD holds the mapping from key ranges to TiKV nodes, which TiDB servers query to route requests.
Stateless TiDB servers
The TiDB server layer (which handles SQL parsing, planning, and execution) is stateless. It holds no data and no session-critical persistent state. This means you can:- Add TiDB server nodes at any time without downtime.
- Remove TiDB server nodes at any time without migrating data.
- Place a load balancer in front of multiple TiDB servers for horizontal scale-out and request-level failover.
Geographic placement and disaster recovery
For disaster recovery across data centers or regions, TiDB supports Placement Policies that control where replicas are located.Multi-zone deployment
Place one replica per availability zone. A zone failure leaves two replicas available, allowing the Raft majority to continue committing writes.Multi-region deployment
For disaster tolerance across geographic regions, you can place replicas in separate regions. Because the Raft leader must wait for a majority of replicas to acknowledge each write, cross-region deployments increase write latency proportional to the round-trip time between regions.Five-replica deployment for higher fault tolerance
Increasing the replica count to 5 allows the cluster to survive two simultaneous node failures:Recovery time objectives
| Failure scenario | Typical recovery time |
|---|---|
| Single TiDB server failure | Immediate (load balancer reroutes) |
| Single TiKV node failure | Seconds (Raft leader election per Region) |
| Single TiKV disk failure | Seconds (Raft election) + minutes (replica rebalancing) |
| Complete availability zone failure | Seconds (Raft election on surviving nodes) |
| PD node failure (3-node PD cluster) | Seconds (etcd leader election) |
Deployment patterns
Single data center (3 nodes per component)
Single data center (3 nodes per component)
The minimum production configuration. Run 3 TiKV nodes, 3 PD nodes, and at least 2 TiDB servers behind a load balancer. This tolerates one TiKV node failure and one PD node failure simultaneously.
Multi-zone (single region)
Multi-zone (single region)
Spread TiKV and PD nodes across three availability zones. A zone failure does not interrupt writes because two zones still form a Raft majority.
Multi-region (geographic disaster recovery)
Multi-region (geographic disaster recovery)
Place two replicas in the primary region and one in a secondary region for disaster recovery with acceptable write latency. Use Placement Policies to pin the Raft leader to the primary region.For zero-RPO across regions, use a 3-region active-active setup with one replica per region.