Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/pingcap/tidb/llms.txt

Use this file to discover all available pages before exploring further.

Traditionally, organizations maintain separate systems for transactional workloads (OLTP) and analytical workloads (OLAP), with ETL pipelines to move data between them. This introduces data staleness, operational complexity, and additional cost. TiDB’s HTAP architecture eliminates that separation. The same cluster stores your data in two formats — row-oriented in TiKV for transactions, and columnar in TiFlash for analytics — and keeps both in sync automatically.

Storage engines

TiKV — row storage

The primary storage engine. Data is organized in rows, making it efficient for point lookups and transactional writes. TiKV uses the Raft consensus protocol to maintain multiple replicas and guarantee strong consistency.

TiFlash — columnar storage

The analytical storage engine. Data is organized in columns, making it efficient for aggregations, range scans, and analytical queries that touch many rows but few columns.
Both engines serve the same data. You do not need to manage two separate databases or run ETL jobs.

Data replication from TiKV to TiFlash

TiFlash receives data through the Multi-Raft Learner protocol. A TiFlash replica is a Raft Learner — it receives all committed Raft log entries from the TiKV leader, applies them in order, and maintains a fully up-to-date columnar copy of the data. Key properties of this replication:
  • Consistency: TiFlash only exposes data that has been committed in the Raft log. A query against TiFlash always reads committed data; it never sees dirty writes or partially committed transactions.
  • Asynchronous replication: TiFlash applies the Raft log asynchronously after TiKV commits. There is a small propagation lag (typically milliseconds), but TiDB’s query layer accounts for this and ensures your query reads a consistent snapshot.
  • No writes to TiFlash directly: All writes go to TiKV first and propagate to TiFlash via Raft. TiFlash is never the write path.
Your application
       │ writes

   TiDB Server ──► TiKV (row storage, Raft leader)

                        │ Raft Learner replication

                   TiFlash (columnar storage)

How the optimizer chooses a storage engine

When you run a query, the TiDB cost-based optimizer automatically decides whether to read from TiKV or TiFlash based on the query shape and available statistics:
  • Point lookups and small range scans are routed to TiKV, where row storage excels.
  • Full table scans, aggregations, and joins over large datasets are routed to TiFlash, where columnar storage is faster.
  • Mixed queries may read from both engines for different parts of the execution plan.
You can inspect the chosen plan with EXPLAIN:
EXPLAIN SELECT region, SUM(revenue) FROM sales GROUP BY region;
Look for TableFullScan with store_type: tiflash to confirm TiFlash is being used.

Adding a TiFlash replica

TiFlash replicas are configured per table. You opt tables into columnar storage with a single SQL statement.
-- Add one TiFlash replica for the sales table
ALTER TABLE sales SET TIFLASH REPLICA 1;

-- Check replication progress (PROGRESS = 1.0 means fully replicated)
SELECT TABLE_NAME, REPLICA_COUNT, AVAILABLE, PROGRESS
FROM information_schema.tiflash_replica
WHERE TABLE_SCHEMA = 'mydb';
Once AVAILABLE is 1 and PROGRESS is 1.0, TiFlash is ready to serve queries on that table.
Adding a TiFlash replica triggers a background data copy from TiKV to TiFlash. For large tables this can take minutes to hours. The table remains fully operational on TiKV throughout the process.
You can add TiFlash replicas to multiple tables and remove them at any time:
-- Add replicas to multiple tables
ALTER TABLE orders SET TIFLASH REPLICA 1;
ALTER TABLE products SET TIFLASH REPLICA 1;

-- Remove TiFlash replica (stops replication and frees columnar storage)
ALTER TABLE orders SET TIFLASH REPLICA 0;

Running analytical queries

Once a table has TiFlash replicas, the optimizer routes eligible queries automatically. You can also force TiFlash with an optimizer hint.

Automatic routing

-- The optimizer automatically uses TiFlash for this aggregation
SELECT
  region,
  product_category,
  SUM(revenue)   AS total_revenue,
  COUNT(*)        AS order_count
FROM sales
WHERE order_date >= '2024-01-01'
GROUP BY region, product_category
ORDER BY total_revenue DESC;

Forcing TiFlash with a hint

-- Force TiFlash even if the optimizer would choose TiKV
SELECT /*+ READ_FROM_STORAGE(TIFLASH[sales]) */
  region,
  SUM(revenue) AS total_revenue
FROM sales
GROUP BY region;

-- Force TiKV for a specific table in a join
SELECT /*+ READ_FROM_STORAGE(TIKV[orders], TIFLASH[order_items]) */
  o.id,
  SUM(oi.amount) AS total
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
GROUP BY o.id;

Verifying TiFlash usage

EXPLAIN SELECT /*+ READ_FROM_STORAGE(TIFLASH[sales]) */
  SUM(revenue) FROM sales WHERE region = 'APAC';
The plan will show ExchangeSender and TableFullScan operators with store_type: tiflash, confirming the query runs on TiFlash.

Isolation between workloads

TiFlash nodes are separate processes from TiKV nodes. Analytical queries running on TiFlash do not consume TiKV resources, so a large analytical scan does not interfere with ongoing transactional workloads. This isolation is a key architectural benefit of the HTAP design.
For best results, deploy TiFlash on nodes with high-memory and fast local storage. Columnar scans are CPU and memory intensive. Keeping TiFlash on dedicated hardware prevents resource contention with TiKV.

Common HTAP patterns

Add TiFlash replicas to your core transactional tables. Dashboard queries read from TiFlash (fast columnar aggregations) while your application writes to TiKV without interference. Data on the dashboard lags by milliseconds, not hours.
ALTER TABLE transactions SET TIFLASH REPLICA 1;
ALTER TABLE accounts SET TIFLASH REPLICA 1;

-- Dashboard query — automatically uses TiFlash
SELECT account_type, SUM(amount), COUNT(*) 
FROM transactions 
WHERE created_at > NOW() - INTERVAL 1 HOUR
GROUP BY account_type;
Instead of replicating data from your transactional database into a warehouse, enable TiFlash on the tables your reports need. Reports run directly against the live database using columnar storage, with no ETL pipeline to maintain.
A single transaction can combine transactional writes with analytical subqueries. TiDB routes each part of the query to the appropriate engine.
BEGIN;
-- Write goes to TiKV
INSERT INTO orders (customer_id, amount, status) VALUES (42, 150.00, 'pending');
-- Analytical check — optimizer may use TiFlash
SELECT AVG(amount) FROM orders WHERE customer_id = 42 AND status = 'completed';
COMMIT;

Build docs developers (and LLMs) love