Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/QwenLM/Qwen3-VL/llms.txt

Use this file to discover all available pages before exploring further.

Available Models

Qwen3-VL is available in multiple sizes and architectures to suit different deployment scenarios, from edge devices to cloud infrastructure.

Model Architectures

  • Dense Models: Traditional transformer architecture
  • MoE (Mixture of Experts): Sparse architecture for efficient scaling
  • Instruct Editions: Fine-tuned for instruction following
  • Thinking Editions: Enhanced with reasoning capabilities

Dense Models

Qwen3-VL-2B

Instruct Edition Thinking Edition

Qwen3-VL-4B

Instruct Edition Thinking Edition

Qwen3-VL-8B

Instruct Edition Thinking Edition

Qwen3-VL-32B

Instruct Edition Thinking Edition

MoE Models

Qwen3-VL-30B-A3B

Architecture: 30B total parameters, 3B active per token Instruct Edition Thinking Edition

Qwen3-VL-235B-A22B

Architecture: 235B total parameters, 22B active per token Instruct Edition Thinking Edition FP8 Quantized Version

Collections

HuggingFace Collection

All Qwen3-VL models including FP8 quantized versions:

ModelScope Collection

All Qwen3-VL models for users in mainland China:

Quantized Models

FP8 Versions

FP8 quantized models are available for all major model sizes, optimized for deployment on NVIDIA H100+ GPUs with CUDA 12+. Find all FP8 versions in:

Legacy Models

Qwen2.5-VL Series

Qwen2.5-VL-32B-Instruct Other Qwen2.5-VL models: 3B, 7B, 72B
  • Collection: Qwen2.5-VL
  • Released: January 28, 2025
AWQ Quantized Versions: Available for 3B, 7B, and 72B models

Qwen2-VL Series

Qwen2-VL-72B-Instruct Other sizes: 2B, 7B
  • Released: August 30, 2024

QvQ-72B-Preview

Experimental research model focusing on visual reasoning:

Model Selection Guide

By Use Case

Edge Deployment: Qwen3-VL-2B (Instruct/Thinking)
  • Smallest footprint, suitable for mobile and edge devices
Balanced Performance: Qwen3-VL-4B or Qwen3-VL-8B
  • Good balance between performance and resource requirements
High Performance: Qwen3-VL-32B or Qwen3-VL-30B-A3B
  • Strong performance for demanding applications
Maximum Capability: Qwen3-VL-235B-A22B
  • State-of-the-art vision-language understanding
  • Best for research and high-end applications

Instruct vs Thinking

Instruct Editions:
  • Optimized for following user instructions
  • Better for general-purpose applications
  • More aligned with human preferences
Thinking Editions:
  • Enhanced reasoning capabilities
  • Better for complex problem-solving
  • Excels in STEM and mathematical tasks

Build docs developers (and LLMs) love