Documentation Index
Fetch the complete documentation index at: https://mintlify.com/QwenLM/Qwen3-VL/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Theqwen-vl-utils package provides helper functions for processing and integrating visual language information with Qwen-VL Series Models. It handles image and video loading, resizing, and formatting for use with Qwen2VL, Qwen2.5VL, and Qwen3VL models.
Installation
When to Use
Useqwen-vl-utils when you need to:
- Process images from various sources (local files, URLs, base64, PIL.Image objects)
- Extract frames from videos for vision-language tasks
- Automatically resize images and videos to optimal dimensions
- Prepare vision inputs for Qwen-VL model processors
Key Functions
process_vision_info
Main function to extract and process all vision information from conversations
fetch_image
Load and resize images from files, URLs, or base64 strings
fetch_video
Extract and process video frames with configurable parameters
smart_resize
Intelligently resize images while maintaining aspect ratio
Quick Example
Supported Input Formats
Images
- Local file paths:
file:///path/to/image.jpg - HTTP/HTTPS URLs:
http://example.com/image.jpg - Base64 encoded:
data:image;base64,/9j/... - PIL.Image objects: Direct PIL.Image.Image instances
Videos
- Local video files:
file:///path/to/video.mp4 - HTTP/HTTPS URLs:
http://example.com/video.mp4 - Frame sequences: List of image paths representing video frames