TiDB supports a nativeDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/pingcap/tidb/llms.txt
Use this file to discover all available pages before exploring further.
VECTOR data type designed for storing and querying high-dimensional vector embeddings produced by AI/ML models. Combined with vector indexes and distance functions, you can build semantic search, recommendation systems, and other embedding-based applications directly in TiDB without a separate vector database.
The VECTOR data type
TheVECTOR(n) type stores a fixed-dimension array of 32-bit floats, where n is the number of dimensions (up to 16383).
TiDB also provides
VECTOR64(n) for 64-bit float precision when higher accuracy is required.Inserting vectors
Pass vector values as a JSON-style array string:Distance functions
TiDB provides built-in functions to compute similarity between vectors:| Function | Description |
|---|---|
VEC_COSINE_DISTANCE(a, b) | Cosine distance (0 = identical, 2 = opposite) |
VEC_L2_DISTANCE(a, b) | Euclidean (L2) distance |
VEC_INNER_PRODUCT(a, b) | Inner product (dot product) |
VEC_L1_DISTANCE(a, b) | Manhattan (L1) distance |
VEC_COSINE_DISTANCE and VEC_L2_DISTANCE.
Similarity search (ANN)
UseORDER BY with a distance function and LIMIT to retrieve the nearest neighbors:
Creating a vector index
Vector indexes accelerate ANN queries. TiDB implements the HNSW (Hierarchical Navigable Small World) algorithm, which provides fast approximate search with configurable recall/speed trade-offs.ALTER TABLE:
Vector indexes are stored in TiFlash, TiDB’s columnar storage engine. Ensure TiFlash replicas are configured before creating a vector index.
Using a specific index
You can hint the optimizer to use a vector index:Example: storing and searching OpenAI embeddings
The following example shows an end-to-end workflow using OpenAItext-embedding-ada-002 (1536 dimensions):
Use cases
- Semantic search: find documents, products, or records by meaning rather than keywords.
- Recommendation systems: surface similar items based on user behavior embeddings.
- Image similarity: compare image feature vectors from vision models.
- Anomaly detection: identify outliers by measuring distance from cluster centers.
- RAG (Retrieval-Augmented Generation): retrieve relevant context chunks to augment LLM prompts.
Verifying index usage
UseEXPLAIN to confirm the query plan uses the vector index:
ANNIndexScan operator in the output, which indicates the HNSW index is being used.