Documentation Index
Fetch the complete documentation index at: https://mintlify.com/alibaba/zvec/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Qwen provides both dense and sparse text embedding capabilities using Alibaba Cloud’s DashScope service. Both classes support configurable dimensions and automatic result caching. Location:python/zvec/extension/qwen_embedding_function.py
Installation
QwenDenseEmbedding
Dense text embedding function using Qwen (DashScope) API.Constructor
Parameters
Desired output embedding dimension. Common values:
512: Balanced performance and accuracy1024: Higher accuracy, larger storage1536: Maximum accuracy for supported models
DashScope embedding model identifier. Options:
text-embedding-v4(recommended)text-embedding-v3text-embedding-v2text-embedding-v1
DashScope API authentication key. If
None, reads from DASHSCOPE_API_KEY environment variable. Obtain from: https://dashscope.console.aliyun.com/Additional DashScope API parameters:
text_type(str): Specifies text role -"query"for search queries or"document"for indexed content. Optimizes embeddings for asymmetric search.
Methods
embed()
input(str): Input text string to embed. Maximum length depends on model (typically 2048-8192 tokens).
DenseVectorType: List of floats representing the embedding vector. Length equalsself.dimension.
TypeError: If input is not a stringValueError: If input is empty or API returns errorRuntimeError: If network or DashScope service errors occur
Usage Examples
Basic Usage
Specific Model
Asymmetric Retrieval
QwenSparseEmbedding
Sparse text embedding function using Qwen (DashScope) API. Generates sparse keyword-weighted vectors suitable for lexical matching and BM25-style retrieval.Constructor
Parameters
Desired output embedding dimension. Common values:
512: Balanced performance1024: Higher accuracy1536: Maximum accuracy
DashScope embedding model identifier.
DashScope API key or None to use environment variable.
Additional DashScope API parameters:
encoding_type(Literal[“query”, “document”]): Encoding type"query": Optimize for search queries (default)"document": Optimize for indexed documents
Methods
embed()
input(str): Input text string to embed.
SparseVectorType: Dictionary mapping dimension index to weight. Only non-zero dimensions included. Sorted by indices for consistency.
TypeError: If input is not a stringValueError: If input is empty or API returns errorRuntimeError: If network or service errors occur
Usage Examples
Basic Usage
Document Embedding
Asymmetric Retrieval
Inspecting Sparse Dimensions
Hybrid Retrieval
Combine dense and sparse embeddings for optimal search:Best Practices
Asymmetric Search: Use
text_type="query" for queries and text_type="document" for documents to optimize retrieval accuracy.Comparison: Dense vs Sparse
| Feature | QwenDenseEmbedding | QwenSparseEmbedding |
|---|---|---|
| Output Format | List of floats | Dictionary (sparse) |
| Typical Size | 512-1536 dimensions (all) | ~150-200 non-zero dimensions |
| Best For | Semantic similarity | Keyword matching |
| Memory | Fixed size | Variable, efficient |
| Interpretability | Low | High (terms visible) |
| Use Case | General retrieval | Lexical search, hybrid |
Error Handling
Notes
- Requires Python 3.10, 3.11, or 3.12
- Requires
dashscopepackage:pip install dashscope - Results are cached (LRU cache, maxsize=10)
- Network connectivity required
- API costs may apply
- Sparse vectors are sorted by indices for consistency