Skip to main content
LangExtract uses a provider system to support different Large Language Model (LLM) backends. This architecture enables you to use cloud models like Gemini and OpenAI, local models through Ollama, or create custom providers for any LLM API.

Architecture

The provider system uses three core components:
  1. Registry - Maps model ID patterns to provider classes
  2. Factory - Creates provider instances based on model IDs
  3. Providers - Implement the BaseLanguageModel interface

How Provider Selection Works

When you call lx.extract(model_id="gemini-2.5-flash", ...), here’s what happens:
  1. Factory receives model_id: “gemini-2.5-flash”
  2. Registry searches patterns: Each provider registers regex patterns
  3. First match wins: Returns the matching provider class
  4. Provider instantiated: With model_id and any kwargs
  5. Inference runs: Using the selected provider

Provider Types

Core Providers (Always Available)

Shipped with LangExtract, dependencies included:
  • Gemini - Google’s Gemini models (API key or Vertex AI)
  • Ollama - Local models via Ollama (no API key required)

Built-in with Optional Dependencies

Shipped with LangExtract, but requires extra installation:
  • OpenAI - OpenAI’s GPT models
    • Code included in package
    • Requires: pip install langextract[openai]

External Plugins (Third-party)

Separate packages that extend LangExtract:
  • Installed separately: pip install langextract-yourprovider
  • Auto-discovered: Uses Python entry points for automatic registration
  • Zero configuration: Import langextract and the provider is available
  • Independent updates: Update providers without touching core

Usage Examples

The simplest approach - let LangExtract choose the provider:
import langextract as lx

# Automatically selects Gemini provider
result = lx.extract(
    text="Your document text",
    model_id="gemini-2.5-flash",
    prompt_description="Extract key facts",
    examples=[...]
)

Explicit Provider Selection

When multiple providers might support the same model ID:
import langextract as lx

# Method 1: Using factory with provider parameter
config = lx.factory.ModelConfig(
    model_id="gpt-4",
    provider="OpenAILanguageModel",  # Explicit provider
    provider_kwargs={"api_key": "..."}
)
model = lx.factory.create_model(config)

# Method 2: Using provider without model_id (uses provider's default)
config = lx.factory.ModelConfig(
    provider="GeminiLanguageModel",  # Will use default gemini-2.5-flash
    provider_kwargs={"api_key": "..."}
)
model = lx.factory.create_model(config)
Provider names can be:
  • Full class name: "GeminiLanguageModel", "OpenAILanguageModel"
  • Partial match: "gemini", "openai", "ollama" (case-insensitive)

Passing Parameters to Providers

Parameters flow from lx.extract() to providers:
# Common parameters handled by lx.extract:
result = lx.extract(
    text="Your document",
    model_id="gemini-2.5-flash",
    prompt_description="Extract entities",
    examples=[...],
    num_workers=4,            # Parallel processing
    max_chunk_size=3000,      # Document chunking
    extraction_passes=3,      # Multiple passes for recall
)

# Provider-specific parameters via **kwargs:
result = lx.extract(
    text="Your document",
    model_id="gemini-2.5-flash",
    prompt_description="Extract entities",
    examples=[...],
    # These go directly to the provider:
    temperature=0.7,          # Sampling temperature
    api_key="your-key",      # Override environment variable
    max_output_tokens=1000,  # Token limit
)

Direct Provider Usage

from langextract.providers.gemini import GeminiLanguageModel

model = GeminiLanguageModel(
    model_id="gemini-2.5-flash",
    api_key="your-key"
)
outputs = model.infer(["prompt1", "prompt2"])

Plugin Discovery

External plugins are automatically discovered via Python entry points:
1. pip install langextract-yourprovider
   └── Installs package containing:
       • Provider class with @lx.providers.registry.register decorator
       • Python entry point pointing to this class

2. import langextract
   └── Loads providers/__init__.py
       └── Plugin loading is lazy (on-demand)

3. lx.extract(model_id="yourmodel-latest")
   └── Triggers plugin discovery via entry points
       └── @lx.providers.registry.register decorator fires
           └── Provider patterns added to registry
               └── Registry matches pattern and uses your provider
Plugin loading is lazy - plugins are discovered when first needed. To manually trigger plugin loading: lx.providers.load_plugins_once()
Set LANGEXTRACT_DISABLE_PLUGINS=1 to disable plugin loading if needed.

Environment Variables

The factory automatically resolves API keys from environment:
ProviderEnvironment Variables (in priority order)
GeminiGEMINI_API_KEY, LANGEXTRACT_API_KEY
OpenAIOPENAI_API_KEY, LANGEXTRACT_API_KEY
OllamaOLLAMA_BASE_URL (default: http://localhost:11434)

Common Issues

Provider Not Found

ValueError: No provider registered for model_id='unknown-model'
Solution: Check available patterns with registry.list_entries()

Missing Dependencies

InferenceConfigError: OpenAI provider requires openai package
Solution: Install optional dependencies:
pip install langextract[openai]

Plugin Not Loading

Solutions:
  1. Manually trigger loading: lx.providers.load_plugins_once()
  2. Check entry points are installed: pip show -f your-package
  3. Verify no typos in pyproject.toml entry point
  4. Ensure package is installed: pip list | grep your-package

Next Steps

Gemini Provider

Use Google’s Gemini models with API key or Vertex AI

OpenAI Provider

Use OpenAI’s GPT models

Ollama Provider

Run local models without API keys

Custom Providers

Create your own provider plugins

Build docs developers (and LLMs) love