Documentation Index
Fetch the complete documentation index at: https://mintlify.com/QwenLM/Qwen3-VL/llms.txt
Use this file to discover all available pages before exploring further.
Qwen3-VL achieves rigorous semantic comprehension of ultra-long documents, supporting native 256K context length that can be extended to 1M tokens. This enables processing of entire books and extensive documents with full recall.
Capability Overview
The long document understanding feature enables you to:
- Process documents up to 256K tokens natively
- Extend context to 1M tokens with YaRN
- Handle entire books and lengthy reports
- Maintain full recall across long contexts
- Parse complex document structures
- Extract information from multi-page documents
Example Usage
from transformers import AutoModelForImageTextToText, AutoProcessor
model = AutoModelForImageTextToText.from_pretrained(
"Qwen/Qwen3-VL-235B-A22B-Instruct", dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained("Qwen/Qwen3-VL-235B-A22B-Instruct")
# For multi-page documents, provide multiple images
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": "page1.jpg"},
{"type": "image", "image": "page2.jpg"},
{"type": "image", "image": "page3.jpg"},
{"type": "text", "text": "Summarize the key points from this document."},
],
}
]
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
)
inputs = inputs.to(model.device)
generated_ids = model.generate(**inputs, max_new_tokens=1024)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
Try it Yourself
Explore the full long document understanding cookbook with interactive examples:
View on GitHub