Documentation Index
Fetch the complete documentation index at: https://mintlify.com/QwenLM/Qwen3-VL/llms.txt
Use this file to discover all available pages before exploring further.
OCR & Key Information Extraction
Qwen3-VL provides powerful optical character recognition (OCR) capabilities with expanded language support and robust performance across diverse conditions. The model excels at both general text recognition and targeted key information extraction.Capabilities
Expanded Language Support
Qwen3-VL supports OCR in 32 languages, significantly expanded from the previous 10:- Latin-script languages
- Chinese (Simplified and Traditional)
- Japanese, Korean
- Arabic, Hebrew, and other RTL scripts
- Cyrillic scripts
- Indic languages
- And many more
Robust Text Recognition
The model performs reliably even in challenging conditions:- Low Light: Extract text from dimly lit or dark images
- Blur: Handle motion blur and out-of-focus text
- Tilt & Rotation: Process text at various angles
- Rare Characters: Recognize ancient characters and specialized symbols
- Technical Jargon: Handle domain-specific terminology and notation
Key Information Extraction
Beyond general OCR, Qwen3-VL can extract specific information from documents:- Forms: Extract field values from structured forms
- Receipts & Invoices: Parse transaction details, dates, amounts
- Business Cards: Extract contact information
- IDs & Documents: Pull specific fields from identification documents
- Tables: Extract data from tabular formats
Use Cases
- Document Digitization: Convert physical documents to searchable text
- Data Entry Automation: Eliminate manual typing from forms and receipts
- Multilingual Content: Process documents in various languages
- Accessibility: Make visual content accessible to screen readers
- Search & Indexing: Enable full-text search on image-based documents
Try It Out
Explore OCR and key information extraction with our interactive cookbook:OCR Cookbook
Stronger text recognition capabilities in natural scenes and multiple languages, supporting diverse key information extraction needs.
Advanced Features
- Natural Scene Text: Extract text from photos, signs, and real-world images
- Handwriting Recognition: Process handwritten notes and annotations
- Mixed Content: Handle documents with multiple languages and scripts
- Layout Understanding: Maintain reading order in complex layouts
Performance Highlights
- 32 language support (up from 10)
- Robust in challenging conditions
- Accurate recognition of rare and ancient characters
- Improved long-document structure parsing
Related Capabilities
- Document Parsing - Full document structure extraction
- 2D Grounding - Locate text within images
- Video Understanding - OCR in video content