Overview
LangExtract is optimized for long documents, overcoming the “needle-in-a-haystack” challenge through text chunking, parallel processing, and multiple extraction passes for higher recall.Quick Example
Process entire documents directly from URLs with enhanced sensitivity:Key Parameters for Scaling
extraction_passes
Number of sequential extraction attempts to improve recall and find additional entities.extraction_passes=1: Standard single extraction passextraction_passes > 1: Multiple independent extractions are performed and merged- Non-overlapping results are combined (first extraction wins for overlaps)
- Improves recall by catching entities missed in earlier passes
max_workers
Maximum parallel workers for concurrent processing.- Enables concurrent API calls for faster processing
- Effective parallelization is limited by
min(batch_length, max_workers) - Supported by Gemini models
- Does NOT increase token costs—only improves processing speed
For large-scale or production use, a Tier 2 Gemini quota is suggested to increase throughput and avoid rate limits. See the rate-limit documentation for details.
max_char_buffer
Maximum number of characters for each inference chunk.- Controls the size of text chunks sent to the model
- Smaller values (e.g., 1000) provide more focused context and better accuracy
- Larger values (e.g., 5000) reduce the number of API calls but may miss entities
- Trade-off between accuracy and API costs
Cost Consideration: Smaller
max_char_buffer values increase the number of API calls, as the document is split into more chunks. Balance accuracy needs with API costs.batch_length
Number of text chunks processed per batch.- Higher values enable greater parallelization when
batch_length >= max_workers - Only
batch_lengthworkers will be used ifbatch_length < max_workers - Set
batch_length >= max_workersfor optimal parallelization
Context Window for Cross-Chunk Entities
Thecontext_window_chars parameter helps with coreference resolution across chunk boundaries:
- Includes characters from the previous chunk as context for the current chunk
- Helps resolve references like “She” to a person mentioned in the previous chunk
- Disabled by default (
None)
Combining Parameters for Optimal Performance
Vertex AI Batch Processing
For large-scale tasks, enable Vertex AI Batch API to save costs:Progress Tracking
LangExtract shows a progress bar by default. Disable it if needed:Best Practices
- Start small: Test with a subset of your document to estimate costs
- Monitor API usage: Track token consumption and API calls
- Balance accuracy vs. cost: Adjust
extraction_passesandmax_char_bufferbased on your needs - Use appropriate quotas: Upgrade to Tier 2 for production workloads
- Leverage parallelization:
max_workersimproves speed without increasing costs
Next Steps
- Learn about visualization options for large result sets
- Configure different model providers for your needs
- Set up API keys for production use