Skip to main content

Overview

LangExtract is optimized for long documents, overcoming the “needle-in-a-haystack” challenge through text chunking, parallel processing, and multiple extraction passes for higher recall.

Quick Example

Process entire documents directly from URLs with enhanced sensitivity:
import langextract as lx
import textwrap

# Define your prompt and examples
prompt = textwrap.dedent("""\
    Extract characters, emotions, and relationships in order of appearance.
    Use exact text for extractions. Do not paraphrase or overlap entities.
    Provide meaningful attributes for each entity to add context.""")

examples = [
    lx.data.ExampleData(
        text="ROMEO. But soft! What light through yonder window breaks? It is the east, and Juliet is the sun.",
        extractions=[
            lx.data.Extraction(
                extraction_class="character",
                extraction_text="ROMEO",
                attributes={"emotional_state": "wonder"}
            ),
            lx.data.Extraction(
                extraction_class="emotion",
                extraction_text="But soft!",
                attributes={"feeling": "gentle awe"}
            ),
            lx.data.Extraction(
                extraction_class="relationship",
                extraction_text="Juliet is the sun",
                attributes={"type": "metaphor"}
            ),
        ]
    )
]

# Process Romeo & Juliet directly from Project Gutenberg
result = lx.extract(
    text_or_documents="https://www.gutenberg.org/files/1513/1513-0.txt",
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    extraction_passes=3,    # Improves recall through multiple passes
    max_workers=20,         # Parallel processing for speed
    max_char_buffer=1000    # Smaller contexts for better accuracy
)
This approach can extract hundreds of entities from full novels (147,843+ characters) while maintaining high accuracy.

Key Parameters for Scaling

extraction_passes

Number of sequential extraction attempts to improve recall and find additional entities.
result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    extraction_passes=3  # Default: 1
)
How it works:
  • extraction_passes=1: Standard single extraction pass
  • extraction_passes > 1: Multiple independent extractions are performed and merged
  • Non-overlapping results are combined (first extraction wins for overlaps)
  • Improves recall by catching entities missed in earlier passes
Cost Impact: Each additional pass reprocesses tokens, potentially increasing API costs. For example, extraction_passes=3 reprocesses tokens 3x. Most APIs charge by token volume, so monitor usage with small test runs to estimate costs.

max_workers

Maximum parallel workers for concurrent processing.
result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    max_workers=20  # Default: 10
)
How it works:
  • Enables concurrent API calls for faster processing
  • Effective parallelization is limited by min(batch_length, max_workers)
  • Supported by Gemini models
  • Does NOT increase token costs—only improves processing speed
For large-scale or production use, a Tier 2 Gemini quota is suggested to increase throughput and avoid rate limits. See the rate-limit documentation for details.

max_char_buffer

Maximum number of characters for each inference chunk.
result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    max_char_buffer=1000  # Default: 1000
)
How it works:
  • Controls the size of text chunks sent to the model
  • Smaller values (e.g., 1000) provide more focused context and better accuracy
  • Larger values (e.g., 5000) reduce the number of API calls but may miss entities
  • Trade-off between accuracy and API costs
Cost Consideration: Smaller max_char_buffer values increase the number of API calls, as the document is split into more chunks. Balance accuracy needs with API costs.

batch_length

Number of text chunks processed per batch.
result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    batch_length=10,  # Default: 10
    max_workers=20
)
How it works:
  • Higher values enable greater parallelization when batch_length >= max_workers
  • Only batch_length workers will be used if batch_length < max_workers
  • Set batch_length >= max_workers for optimal parallelization
If batch_length < max_workers, you’ll see a warning and only batch_length workers will be utilized.

Context Window for Cross-Chunk Entities

The context_window_chars parameter helps with coreference resolution across chunk boundaries:
result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    context_window_chars=200  # Default: None (disabled)
)
How it works:
  • Includes characters from the previous chunk as context for the current chunk
  • Helps resolve references like “She” to a person mentioned in the previous chunk
  • Disabled by default (None)

Combining Parameters for Optimal Performance

result = lx.extract(
    text_or_documents="https://example.com/long-document.txt",
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    extraction_passes=3,      # Multiple passes for better recall
    max_workers=20,           # Fast parallel processing
    max_char_buffer=800,      # Smaller chunks for accuracy
    batch_length=20,          # Match max_workers
    context_window_chars=200  # Cross-chunk entity resolution
)

Vertex AI Batch Processing

For large-scale tasks, enable Vertex AI Batch API to save costs:
result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    language_model_params={
        "vertexai": True,
        "batch": {"enabled": True}
    }
)

Progress Tracking

LangExtract shows a progress bar by default. Disable it if needed:
result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    show_progress=False  # Default: True
)

Best Practices

  1. Start small: Test with a subset of your document to estimate costs
  2. Monitor API usage: Track token consumption and API calls
  3. Balance accuracy vs. cost: Adjust extraction_passes and max_char_buffer based on your needs
  4. Use appropriate quotas: Upgrade to Tier 2 for production workloads
  5. Leverage parallelization: max_workers improves speed without increasing costs

Next Steps

Build docs developers (and LLMs) love