Long Documents

Overview

LangExtract is optimized for long documents, overcoming the “needle-in-a-haystack” challenge through text chunking, parallel processing, and multiple extraction passes for higher recall.

Quick Example

Process entire documents directly from URLs with enhanced sensitivity:

import langextract as lx
import textwrap

# Define your prompt and examples
prompt = textwrap.dedent("""\
    Extract characters, emotions, and relationships in order of appearance.
    Use exact text for extractions. Do not paraphrase or overlap entities.
    Provide meaningful attributes for each entity to add context.""")

examples = [
    lx.data.ExampleData(
        text="ROMEO. But soft! What light through yonder window breaks? It is the east, and Juliet is the sun.",
        extractions=[
            lx.data.Extraction(
                extraction_class="character",
                extraction_text="ROMEO",
                attributes={"emotional_state": "wonder"}
            ),
            lx.data.Extraction(
                extraction_class="emotion",
                extraction_text="But soft!",
                attributes={"feeling": "gentle awe"}
            ),
            lx.data.Extraction(
                extraction_class="relationship",
                extraction_text="Juliet is the sun",
                attributes={"type": "metaphor"}
            ),
        ]
    )
]

# Process Romeo & Juliet directly from Project Gutenberg
result = lx.extract(
    text_or_documents="https://www.gutenberg.org/files/1513/1513-0.txt",
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    extraction_passes=3,    # Improves recall through multiple passes
    max_workers=20,         # Parallel processing for speed
    max_char_buffer=1000    # Smaller contexts for better accuracy
)

This approach can extract hundreds of entities from full novels (147,843+ characters) while maintaining high accuracy.

Key Parameters for Scaling

extraction_passes

Number of sequential extraction attempts to improve recall and find additional entities.

result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    extraction_passes=3  # Default: 1
)

How it works:

extraction_passes=1: Standard single extraction pass
extraction_passes > 1: Multiple independent extractions are performed and merged
Non-overlapping results are combined (first extraction wins for overlaps)
Improves recall by catching entities missed in earlier passes

Cost Impact: Each additional pass reprocesses tokens, potentially increasing API costs. For example, extraction_passes=3 reprocesses tokens 3x. Most APIs charge by token volume, so monitor usage with small test runs to estimate costs.

max_workers

Maximum parallel workers for concurrent processing.

result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    max_workers=20  # Default: 10
)

How it works:

Enables concurrent API calls for faster processing
Effective parallelization is limited by min(batch_length, max_workers)
Supported by Gemini models
Does NOT increase token costs—only improves processing speed

For large-scale or production use, a Tier 2 Gemini quota is suggested to increase throughput and avoid rate limits. See the rate-limit documentation for details.

max_char_buffer

Maximum number of characters for each inference chunk.

result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    max_char_buffer=1000  # Default: 1000
)

How it works:

Controls the size of text chunks sent to the model
Smaller values (e.g., 1000) provide more focused context and better accuracy
Larger values (e.g., 5000) reduce the number of API calls but may miss entities
Trade-off between accuracy and API costs

Cost Consideration: Smaller max_char_buffer values increase the number of API calls, as the document is split into more chunks. Balance accuracy needs with API costs.

batch_length

Number of text chunks processed per batch.

result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    batch_length=10,  # Default: 10
    max_workers=20
)

How it works:

Higher values enable greater parallelization when batch_length >= max_workers
Only batch_length workers will be used if batch_length < max_workers
Set batch_length >= max_workers for optimal parallelization

If batch_length < max_workers, you’ll see a warning and only batch_length workers will be utilized.

Context Window for Cross-Chunk Entities

The context_window_chars parameter helps with coreference resolution across chunk boundaries:

result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    context_window_chars=200  # Default: None (disabled)
)

How it works:

Includes characters from the previous chunk as context for the current chunk
Helps resolve references like “She” to a person mentioned in the previous chunk
Disabled by default (None)

Combining Parameters for Optimal Performance

result = lx.extract(
    text_or_documents="https://example.com/long-document.txt",
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    extraction_passes=3,      # Multiple passes for better recall
    max_workers=20,           # Fast parallel processing
    max_char_buffer=800,      # Smaller chunks for accuracy
    batch_length=20,          # Match max_workers
    context_window_chars=200  # Cross-chunk entity resolution
)

Vertex AI Batch Processing

For large-scale tasks, enable Vertex AI Batch API to save costs:

result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    language_model_params={
        "vertexai": True,
        "batch": {"enabled": True}
    }
)

Progress Tracking

LangExtract shows a progress bar by default. Disable it if needed:

result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    show_progress=False  # Default: True
)

Best Practices

Start small: Test with a subset of your document to estimate costs
Monitor API usage: Track token consumption and API calls
Balance accuracy vs. cost: Adjust extraction_passes and max_char_buffer based on your needs
Use appropriate quotas: Upgrade to Tier 2 for production workloads
Leverage parallelization: max_workers improves speed without increasing costs

Next Steps

Learn about visualization options for large result sets
Configure different model providers for your needs
Set up API keys for production use

Get Started

Core Concepts

Guides

Model Providers

Examples

Overview

Quick Example

Key Parameters for Scaling

extraction_passes

max_workers

max_char_buffer

batch_length

Context Window for Cross-Chunk Entities

Combining Parameters for Optimal Performance

Vertex AI Batch Processing

Progress Tracking

Best Practices

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Model Providers

Examples

​Overview

​Quick Example

​Key Parameters for Scaling

​extraction_passes

​max_workers

​max_char_buffer

​batch_length

​Context Window for Cross-Chunk Entities

​Combining Parameters for Optimal Performance

​Vertex AI Batch Processing

​Progress Tracking

​Best Practices

​Next Steps

Build docs developers (and LLMs) love

Overview

Quick Example

Key Parameters for Scaling

extraction_passes

max_workers

max_char_buffer

batch_length

Context Window for Cross-Chunk Entities

Combining Parameters for Optimal Performance

Vertex AI Batch Processing

Progress Tracking

Best Practices

Next Steps