Save ~50% on costs for large-scale workloads using Vertex AI Batch API with automatic routing, caching, and fault tolerance
The Vertex AI Batch API offers significant cost savings (~50%) for large, non-time-critical workloads. LangExtract seamlessly integrates this with automatic routing, caching, and fault tolerance.
This example demonstrates how to process a large text (the first ~20 pages of Romeo and Juliet) using the Batch API. We use a small chunk size (max_char_buffer=500) to generate enough chunks to trigger batch processing.
1
Configure logging and download text
import requestsimport textwrapimport langextract as lximport logging# Configure logging to see progress (both in console and file)logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', handlers=[ logging.FileHandler("batch_process.log"), logging.StreamHandler() ])# 1. Download Text (Shakespeare's Romeo and Juliet)url = "https://www.gutenberg.org/files/1513/1513-0.txt"print(f"Downloading {url}...")text = requests.get(url).text# Process first ~20 pages (approx. 60k characters).text_subset = text[:60000]print(f"Processing first {len(text_subset)} characters...")
2
Define prompt and examples
# 2. Define Prompt & Examplesprompt = textwrap.dedent("""\ Extract characters and emotions from the text. Use exact text from the input for extraction_text.""")examples = [ lx.data.ExampleData( text="ROMEO. But soft! What light through yonder window breaks?", extractions=[ lx.data.Extraction(extraction_class="character", extraction_text="ROMEO"), lx.data.Extraction(extraction_class="emotion", extraction_text="But soft!"), ] )]
3
Configure batch settings
# 3. Configure Batch Settingsbatch_config = { "enabled": True, "threshold": 10, "poll_interval": 30, "timeout": 3600, # Set to True to cache results in GCS. Add timestamp to prompt to force re-run. "enable_caching": True, # Retention policy for GCS bucket (days). None for permanent. "retention_days": 30,}
4
Run extraction with batch API
# 4. Run Extraction# langextract will automatically chunk the text and submit a batch job.results = lx.extract( text_or_documents=text_subset, prompt_description=prompt, examples=examples, model_id="gemini-2.5-flash", max_char_buffer=500, batch_length=1000, language_model_params={ "vertexai": True, "project": "your-gcp-project", # TODO: Replace with your Project ID. "location": "us-central1", "batch": batch_config })
Note on batch_length: The batch_length parameter controls how many chunks are submitted in a single batch job. For optimal performance with the Batch API, set this to a high value (e.g., 1000) to process all chunks in a single job rather than multiple sequential jobs.
The library automatically creates and manages a GCS bucket for you, named:
langextract-{project}-{location}-batchInside this bucket, data is organized as follows:
LangExtract’s batch processing is designed to minimize costs:
Cost Efficiency
Smart Caching
Lifecycle Management
Vertex AI Batch predictions are typically ~50% cheaper than online predictions, making them ideal for large-scale, non-time-critical workloads.
Results are cached in your GCS bucket (cache/ directory).Instant Retrieval: Re-running identical prompts fetches results directly from storage, bypassing model inference.Reduced Inference: You avoid paying for redundant model calls on previously processed data.
Use retention_days (e.g., 30) to automatically clean up old data and manage storage usage.Set to None for permanent storage of results.
LangExtract handles all GCS operations automatically using a dedicated bucket (gs://langextract-{project}-{location}-batch). Note that input/output files are retained for debugging.