Documentation Index
Fetch the complete documentation index at: https://mintlify.com/QwenLM/Qwen3-VL/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Qwen3-VL uses thetransformers library for inference. This guide covers the basic setup and single-image inference.
Requirements
Basic Inference
Loading the Model
Single Image Inference
Image Input Formats
Qwen3-VL supports multiple image input formats:- URL:
"https://path/to/image.jpg" - Local file:
"file:///path/to/image.jpg" - Base64:
"data:image;base64,/9j/..."
Performance Optimization
We recommend enabling
flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.Installing Flash Attention 2
Flash-Attention 2 requires:
- Hardware compatible with Flash-Attention 2
- Model loaded in
torch.float16ortorch.bfloat16
Next Steps
Multi-Image Processing
Learn how to process multiple images in a single request
Video Processing
Process video inputs with frame sampling