Skip to main content
LangExtract can handle a wide variety of extraction tasks, from processing classic literature to extracting medical information and working with non-English languages. These examples demonstrate key features and best practices for different scenarios.

Available Examples

Romeo and Juliet Extraction

Process entire documents from URLs with parallel processing. Extract characters, emotions, and relationships from the complete text of Shakespeare’s Romeo and Juliet.

Medication Extraction

Extract structured medical information from clinical text. Demonstrates both basic NER and relationship extraction for healthcare applications.

Batch Processing

Save ~50% on costs for large-scale workloads using Vertex AI Batch API. Includes automatic routing, caching, and fault tolerance.

Japanese Extraction

Extract structured information from Japanese text using UnicodeTokenizer for correct character-based segmentation and alignment.

Key Concepts Demonstrated

Long Document Processing

The Romeo and Juliet example shows how to handle large texts (147,843 characters) with:
  • Sequential extraction passes for improved recall
  • Parallel processing for speed optimization
  • Smart chunking strategies for better accuracy
  • Interactive visualization of thousands of entities

Domain-Specific Extraction

The Medication Extraction examples demonstrate:
  • Named Entity Recognition (NER) for medical entities
  • Relationship Extraction (RE) using attribute-based grouping
  • Position tracking for entity verification
  • Structured output for healthcare applications

Cost Optimization

The Batch Processing guide covers:
  • Vertex AI Batch API integration for ~50% cost savings
  • Automatic routing between real-time and batch processing
  • GCS-based caching for instant result retrieval
  • Lifecycle management for storage optimization

Multilingual Support

The Japanese Extraction example illustrates:
  • Using UnicodeTokenizer for non-spaced languages
  • Correct grapheme segmentation and alignment
  • Few-shot examples for multilingual tasks
  • Character position tracking in Unicode text

Getting Started

Each example includes:
  • Complete, runnable code
  • Sample output and visualizations
  • Best practices and optimization tips
  • Detailed explanations of key parameters
Select an example from the cards above to dive into specific use cases and learn how to apply LangExtract to your own projects.

Build docs developers (and LLMs) love