TiDB is a distributed database built from four components. The compute layer (TiDB Server) is stateless and can scale independently from the storage layer (TiKV and TiFlash). PD (Placement Driver) acts as the cluster brain, managing metadata and scheduling data movement.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/pingcap/tidb/llms.txt
Use this file to discover all available pages before exploring further.
Components
TiDB Server
The stateless SQL layer. Parses queries, builds and optimizes execution plans, and coordinates execution across TiKV and TiFlash. Multiple TiDB Server nodes can run behind a load balancer; they share no state with each other.
TiKV
The distributed row-based storage engine. Data is divided into Regions (default 96 MB each). Each Region is replicated across three TiKV nodes using the Raft consensus protocol. TiKV handles all transactional (OLTP) reads and writes.
TiFlash
The columnar storage engine for analytical queries. TiFlash nodes receive real-time data replication from TiKV using the Multi-Raft Learner protocol. TiDB Server chooses TiFlash automatically for queries that benefit from columnar access, using MPP (Massively Parallel Processing) execution.
PD (Placement Driver)
The cluster management component. PD stores cluster metadata in etcd, allocates globally unique timestamps for transactions, monitors Region health, and schedules Region movement between TiKV nodes to balance load and recover from failures.
How SQL queries flow through the system
Client sends SQL
A client connects to any TiDB Server node on port 4000 using the MySQL protocol and sends a SQL statement.
TiDB Server parses and plans
TiDB Server parses the SQL into an AST, runs the optimizer to build an execution plan, and determines which storage engines to read from (TiKV, TiFlash, or both).
Execution is dispatched to storage
For point queries and small range scans, TiDB sends coprocessor requests to TiKV nodes that own the relevant Regions. For large analytical queries, TiDB dispatches MPP tasks to TiFlash nodes and collects the results.
Distributed transactions
TiDB implements distributed transactions using a two-phase commit (2PC) protocol. When a transaction writes data across multiple Regions:- Prewrite phase — TiDB writes locks and data to all affected Regions.
- Commit phase — After a majority of replicas acknowledge the prewrite, TiDB commits the transaction and removes the locks.
Transactions in TiDB are ACID-compliant. Strong consistency is guaranteed even when some replicas fail, because writes are only committed after the Raft majority acknowledges them.
Raft consensus and data replication
TiKV divides all data into contiguous key ranges called Regions. Each Region is approximately 96 MB and has three replicas by default, each on a different TiKV node. The replicas for a Region form a Raft group.- One replica is the leader and handles all reads and writes for that Region.
- The other two replicas are followers and apply log entries replicated from the leader.
- If the leader fails, Raft elects a new leader from the followers automatically.
Region-based data sharding
TiDB does not use hash-based or range-based sharding that you configure manually. Instead, TiKV automatically splits Regions when they grow too large and merges them when they shrink. PD monitors Region sizes and schedules splits, merges, and migrations to keep data evenly distributed across TiKV nodes. From the SQL layer, this is transparent: you interact with a single logical table and TiDB handles all Region routing internally.Separation of compute and storage
Because TiDB Server nodes are stateless, you can:- Scale compute independently by adding or removing TiDB Server nodes with no data movement.
- Scale storage independently by adding TiKV or TiFlash nodes; PD automatically rebalances Regions onto the new nodes.
TiFlash and HTAP
TiFlash uses the Multi-Raft Learner protocol to replicate data from TiKV in real time. A Raft Learner receives log entries from TiKV leaders but does not participate in leader elections, so it adds no overhead to the OLTP write path. TiDB Server tracks which tables have TiFlash replicas. For a query involving a table with a TiFlash replica, the optimizer can choose to:- Push the full query to TiFlash (columnar scan, aggregation via MPP).
- Split execution: scan from TiFlash for the analytical part, fetch small lookup results from TiKV.
Next steps
Quick start
Deploy a local cluster and connect with a MySQL client.
Introduction
Read the full feature overview and supported use cases.