Skip to main content

Overview

LangExtract supports multiple LLM providers, from cloud-based models like Google Gemini and OpenAI to local models via Ollama. The library uses a lightweight plugin system for easy provider switching.

Default Provider: Google Gemini

Gemini is the default and recommended provider, offering excellent performance and structured output support.
import langextract as lx

result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",  # Default provider
    api_key=os.environ.get('LANGEXTRACT_API_KEY')
)
  • gemini-2.5-flash: Recommended default for most use cases
    • Excellent balance of speed, cost, and quality
    • Best for production workloads
  • gemini-2.5-pro: For complex tasks requiring deeper reasoning
    • Superior results on highly complex extractions
    • Higher cost per token
For large-scale or production use, a Tier 2 Gemini quota is suggested to increase throughput and avoid rate limits. See the rate-limit documentation for details.
Model Lifecycle: Gemini models have a lifecycle with defined retirement dates. Consult the official model version documentation to stay informed about the latest stable and legacy versions.

OpenAI Models

LangExtract supports OpenAI models through an optional dependency.

Installation

pip install langextract[openai]

Usage

import langextract as lx
import os

result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gpt-4o",  # Automatically selects OpenAI provider
    api_key=os.environ.get('OPENAI_API_KEY'),
    fence_output=True,
    use_schema_constraints=False
)
OpenAI models require fence_output=True and use_schema_constraints=False because LangExtract doesn’t implement schema constraints for OpenAI yet.

Available OpenAI Models

  • gpt-4o: Latest GPT-4 model with vision capabilities
  • gpt-4-turbo: Fast GPT-4 variant
  • gpt-3.5-turbo: Economical option for simpler tasks

Local Models with Ollama

Run models locally without API keys using Ollama.

Setup

1

Install Ollama

Download and install Ollama from ollama.com
2

Pull a Model

ollama pull gemma2:2b
3

Start Ollama Server

ollama serve

Usage

import langextract as lx

result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemma2:2b",  # Automatically selects Ollama provider
    model_url="http://localhost:11434",
    fence_output=False,
    use_schema_constraints=False
)
Ollama models don’t require an API key and run entirely on your local machine.

Vertex AI

Use Vertex AI for enterprise deployments with service account authentication.
import langextract as lx

result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    language_model_params={
        "vertexai": True,
        "project": "your-project-id",
        "location": "global"  # or regional endpoint like "us-central1"
    }
)

Vertex AI Batch Processing

Save costs on large-scale tasks:
result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    language_model_params={
        "vertexai": True,
        "project": "your-project-id",
        "location": "us-central1",
        "batch": {"enabled": True}
    }
)

Custom Model Providers

LangExtract supports custom LLM providers via a lightweight plugin system.

Key Features

  • Add new model support independently of the core library
  • Distribute your provider as a separate Python package
  • Keep custom dependencies isolated
  • Override or extend built-in providers via priority-based resolution

Creating a Custom Provider

from langextract.providers import registry
from langextract.core import base_model

@registry.register(
    provider_name="my-provider",
    model_patterns=["my-model-*"],
    priority=100
)
class MyCustomProvider(base_model.BaseLanguageModel):
    def __init__(self, model_id: str, **kwargs):
        super().__init__(model_id=model_id, **kwargs)
        # Initialize your provider
    
    def generate(self, prompt: str) -> str:
        # Implement generation logic
        pass
See the Provider System Documentation for detailed instructions.

Advanced Configuration

For more control, use the config parameter:
from langextract import factory
import langextract as lx

config = factory.ModelConfig(
    model_id="gemini-2.5-flash",
    provider_kwargs={
        "api_key": "your-api-key",
        "temperature": 0.0,  # Deterministic output
        "max_workers": 20
    }
)

result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    config=config
)

Comparing Providers

result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    api_key=os.environ.get('LANGEXTRACT_API_KEY')
)

Model Parameters

Customize model behavior with these parameters:

temperature

Controls randomness in generation:
result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    temperature=0.0  # Default: None (uses model default)
)
  • 0.0: Deterministic output
  • Higher values: More variation

use_schema_constraints

Enable structured outputs for supported models:
result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    use_schema_constraints=True  # Default: True
)
Schema constraints are supported by Gemini models but not yet implemented for OpenAI.

Next Steps

Build docs developers (and LLMs) love