Documentation Index
Fetch the complete documentation index at: https://mintlify.com/QwenLM/Qwen3-VL/llms.txt
Use this file to discover all available pages before exploring further.
2025
November 27, 2025 - Technical Paper Release
Qwen3-VL Technical Paper Published We released the Qwen3-VL technical paper, providing comprehensive technical details about the model architecture, training methodology, and evaluation results. Key Topics Covered:- Interleaved-MRoPE architecture
- DeepStack multi-level feature fusion
- Text-Timestamp alignment for videos
- Training data and methodology
- Comprehensive benchmark evaluations
- Ablation studies
- Paper: arXiv:2511.21631
- Blog: Qwen AI Blog
October 21, 2025 - Qwen3-VL 2B & 32B Models
New Model Releases Released four new models expanding the Qwen3-VL family:- Qwen3-VL-2B-Instruct (HuggingFace)
- Qwen3-VL-2B-Thinking (HuggingFace)
- Qwen3-VL-32B-Instruct (HuggingFace)
- Qwen3-VL-32B-Thinking (HuggingFace)
- 2B models suitable for edge deployment and consumer GPUs
- 32B models provide high performance with reasonable resource requirements
- Both Instruct and Thinking editions available
- Available on both HuggingFace and ModelScope
October 15, 2025 - Qwen3-VL 4B & 8B Models
New Model Releases Released four new models in the mid-size range:- Qwen3-VL-4B-Instruct (HuggingFace)
- Qwen3-VL-4B-Thinking (HuggingFace)
- Qwen3-VL-8B-Instruct (HuggingFace)
- Qwen3-VL-8B-Thinking (HuggingFace)
- Balanced performance-to-resource ratio
- 4B suitable for RTX 3090/4090 GPUs
- 8B provides strong capabilities for professional use
- Both editions support full feature set
October 4, 2025 - MoE Models & FP8 Quantization
Qwen3-VL-30B-A3B Release Released our first Mixture-of-Experts (MoE) models:- Qwen3-VL-30B-A3B-Instruct (HuggingFace)
- Qwen3-VL-30B-A3B-Thinking (HuggingFace)
- 30B total parameters, 3B active per token
- Efficient inference with MoE architecture
- FP8 models available in collections:
September 23, 2025 - Qwen3-VL 235B Launch
Qwen3-VL-235B-A22B Release Released the flagship Qwen3-VL models:- Qwen3-VL-235B-A22B-Instruct (HuggingFace)
- Qwen3-VL-235B-A22B-Thinking (HuggingFace)
- State-of-the-art vision-language performance
- 235B parameters, 22B active (MoE)
- Native 256K context, expandable to 1M
- Advanced capabilities:
- Visual agent (PC/mobile GUI interaction)
- Visual coding (Draw.io, HTML/CSS/JS)
- 3D grounding and spatial reasoning
- Enhanced video understanding
- 32-language OCR
- Superior multimodal reasoning
April 8, 2025 - Fine-tuning Code Release
Fine-tuning Support Released fine-tuning code for Qwen2-VL and Qwen2.5-VL, compatible with Qwen3-VL. Resources:- Code: qwen-vl-finetune
- Supports custom visual tasks
- LoRA and full fine-tuning
March 25, 2025 - Qwen2.5-VL-32B
Qwen2.5-VL-32B-Instruct Release Released Qwen2.5-VL-32B-Instruct. Improvements:- Smarter responses
- Better human preference alignment
- Enhanced reasoning capabilities
February 20, 2025 - Qwen2.5-VL Technical Report
Technical Report & AWQ Models Released the Qwen2.5-VL Technical Report along with AWQ-quantized models:January 28, 2025 - Qwen2.5-VL Series
Qwen2.5-VL Family Release Released the complete Qwen2.5-VL series on HuggingFace. Models:- Qwen2.5-VL-3B-Instruct
- Qwen2.5-VL-7B-Instruct
- Qwen2.5-VL-72B-Instruct
2024
December 25, 2024 - QvQ-72B-Preview
Visual Reasoning Research Model Released QvQ-72B-Preview, an experimental model focusing on enhanced visual reasoning. Features:- 72B parameters
- Advanced visual reasoning capabilities
- Research preview for community feedback
September 19, 2024 - Qwen2-VL-72B
Large Model & Quantization Released Qwen2-VL-72B-Instruct with multiple quantized versions:- Qwen2-VL-72B-Instruct
- Qwen2-VL-72B-Instruct-AWQ
- Qwen2-VL-72B-Instruct-GPTQ-Int4
- Qwen2-VL-72B-Instruct-GPTQ-Int8
August 30, 2024 - Qwen2-VL Series Launch
Qwen2-VL Family Release Released the Qwen2-VL series:- Qwen2-VL-2B-Instruct
- Qwen2-VL-7B-Instruct
- (72B announced, released later)
- Multi-resolution image understanding
- Video support
- Enhanced OCR
- Improved multimodal reasoning
Release Timeline Summary
| Date | Release | Highlights |
|---|---|---|
| 2025-11-27 | Technical Paper | Architectural details and evaluations |
| 2025-10-21 | 2B & 32B Models | Edge to high-performance range |
| 2025-10-15 | 4B & 8B Models | Mid-range balanced performance |
| 2025-10-04 | 30B-A3B MoE & FP8 | Efficient MoE, quantized models |
| 2025-09-23 | 235B-A22B Flagship | State-of-the-art VLM |
| 2025-04-08 | Fine-tuning Code | Custom training support |
| 2025-03-25 | Qwen2.5-VL-32B | Improved alignment |
| 2025-02-20 | Qwen2.5-VL Report & AWQ | Technical documentation |
| 2025-01-28 | Qwen2.5-VL Series | Complete model family |
| 2024-12-25 | QvQ-72B Preview | Visual reasoning research |
| 2024-09-19 | Qwen2-VL-72B & Paper | Large model & documentation |
| 2024-08-30 | Qwen2-VL Launch | Series introduction |
Migration Guides
From Qwen2.5-VL to Qwen3-VL
Key Changes:-
Patch size: 14 → 16
-
Video metadata: New return format
- Architecture improvements: Interleaved-MRoPE, DeepStack, Text-Timestamp alignment
-
New capabilities:
- 3D grounding
- Enhanced visual coding
- Better spatial reasoning
- Extended OCR (32 languages)
From Qwen2-VL to Qwen3-VL
Major Updates:- Significantly improved performance across all tasks
- Native long context (256K vs 32K)
- Video understanding enhancements
- New architectures (MoE, Thinking editions)
- Broader model size range (2B-235B)
Future Roadmap
The following items are planned but subject to change:
- Additional model sizes and variants
- Enhanced fine-tuning tools and examples
- More cookbook examples and tutorials
- Extended language support for OCR
- Performance optimizations
- Community-contributed extensions
Resources
Documentation
Code & Models
Papers
- Qwen3-VL Technical Report (2025)
- Qwen2.5-VL Technical Report (2025)
- Qwen2-VL Paper (2024)