Traditionally, organizations maintain separate systems for transactional workloads (OLTP) and analytical workloads (OLAP), with ETL pipelines to move data between them. This introduces data staleness, operational complexity, and additional cost. TiDB’s HTAP architecture eliminates that separation. The same cluster stores your data in two formats — row-oriented in TiKV for transactions, and columnar in TiFlash for analytics — and keeps both in sync automatically.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/pingcap/tidb/llms.txt
Use this file to discover all available pages before exploring further.
Storage engines
TiKV — row storage
The primary storage engine. Data is organized in rows, making it efficient for point lookups and transactional writes. TiKV uses the Raft consensus protocol to maintain multiple replicas and guarantee strong consistency.
TiFlash — columnar storage
The analytical storage engine. Data is organized in columns, making it efficient for aggregations, range scans, and analytical queries that touch many rows but few columns.
Data replication from TiKV to TiFlash
TiFlash receives data through the Multi-Raft Learner protocol. A TiFlash replica is a Raft Learner — it receives all committed Raft log entries from the TiKV leader, applies them in order, and maintains a fully up-to-date columnar copy of the data. Key properties of this replication:- Consistency: TiFlash only exposes data that has been committed in the Raft log. A query against TiFlash always reads committed data; it never sees dirty writes or partially committed transactions.
- Asynchronous replication: TiFlash applies the Raft log asynchronously after TiKV commits. There is a small propagation lag (typically milliseconds), but TiDB’s query layer accounts for this and ensures your query reads a consistent snapshot.
- No writes to TiFlash directly: All writes go to TiKV first and propagate to TiFlash via Raft. TiFlash is never the write path.
How the optimizer chooses a storage engine
When you run a query, the TiDB cost-based optimizer automatically decides whether to read from TiKV or TiFlash based on the query shape and available statistics:- Point lookups and small range scans are routed to TiKV, where row storage excels.
- Full table scans, aggregations, and joins over large datasets are routed to TiFlash, where columnar storage is faster.
- Mixed queries may read from both engines for different parts of the execution plan.
EXPLAIN:
TableFullScan with store_type: tiflash to confirm TiFlash is being used.
Adding a TiFlash replica
TiFlash replicas are configured per table. You opt tables into columnar storage with a single SQL statement.AVAILABLE is 1 and PROGRESS is 1.0, TiFlash is ready to serve queries on that table.
Adding a TiFlash replica triggers a background data copy from TiKV to TiFlash. For large tables this can take minutes to hours. The table remains fully operational on TiKV throughout the process.
Running analytical queries
Once a table has TiFlash replicas, the optimizer routes eligible queries automatically. You can also force TiFlash with an optimizer hint.Automatic routing
Forcing TiFlash with a hint
Verifying TiFlash usage
ExchangeSender and TableFullScan operators with store_type: tiflash, confirming the query runs on TiFlash.
Isolation between workloads
TiFlash nodes are separate processes from TiKV nodes. Analytical queries running on TiFlash do not consume TiKV resources, so a large analytical scan does not interfere with ongoing transactional workloads. This isolation is a key architectural benefit of the HTAP design.Common HTAP patterns
Real-time dashboard over live transactional data
Real-time dashboard over live transactional data
Add TiFlash replicas to your core transactional tables. Dashboard queries read from TiFlash (fast columnar aggregations) while your application writes to TiKV without interference. Data on the dashboard lags by milliseconds, not hours.
Eliminating a separate data warehouse for reporting
Eliminating a separate data warehouse for reporting
Instead of replicating data from your transactional database into a warehouse, enable TiFlash on the tables your reports need. Reports run directly against the live database using columnar storage, with no ETL pipeline to maintain.
Mixed OLTP/OLAP transactions
Mixed OLTP/OLAP transactions
A single transaction can combine transactional writes with analytical subqueries. TiDB routes each part of the query to the appropriate engine.