Provider System Overview

LangExtract uses a provider system to support different Large Language Model (LLM) backends. This architecture enables you to use cloud models like Gemini and OpenAI, local models through Ollama, or create custom providers for any LLM API.

Architecture

The provider system uses three core components:

Registry - Maps model ID patterns to provider classes
Factory - Creates provider instances based on model IDs
Providers - Implement the BaseLanguageModel interface

How Provider Selection Works

When you call lx.extract(model_id="gemini-2.5-flash", ...), here’s what happens:

Factory receives model_id: “gemini-2.5-flash”
Registry searches patterns: Each provider registers regex patterns
First match wins: Returns the matching provider class
Provider instantiated: With model_id and any kwargs
Inference runs: Using the selected provider

Provider Types

Core Providers (Always Available)

Shipped with LangExtract, dependencies included:

Gemini - Google’s Gemini models (API key or Vertex AI)
Ollama - Local models via Ollama (no API key required)

Built-in with Optional Dependencies

Shipped with LangExtract, but requires extra installation:

OpenAI - OpenAI’s GPT models
- Code included in package
- Requires: pip install langextract[openai]

External Plugins (Third-party)

Separate packages that extend LangExtract:

Installed separately: pip install langextract-yourprovider
Auto-discovered: Uses Python entry points for automatic registration
Zero configuration: Import langextract and the provider is available
Independent updates: Update providers without touching core

Usage Examples

Auto-Detection (Recommended)

The simplest approach - let LangExtract choose the provider:

import langextract as lx

# Automatically selects Gemini provider
result = lx.extract(
    text="Your document text",
    model_id="gemini-2.5-flash",
    prompt_description="Extract key facts",
    examples=[...]
)

Explicit Provider Selection

When multiple providers might support the same model ID:

import langextract as lx

# Method 1: Using factory with provider parameter
config = lx.factory.ModelConfig(
    model_id="gpt-4",
    provider="OpenAILanguageModel",  # Explicit provider
    provider_kwargs={"api_key": "..."}
)
model = lx.factory.create_model(config)

# Method 2: Using provider without model_id (uses provider's default)
config = lx.factory.ModelConfig(
    provider="GeminiLanguageModel",  # Will use default gemini-2.5-flash
    provider_kwargs={"api_key": "..."}
)
model = lx.factory.create_model(config)

Provider names can be:

Full class name: "GeminiLanguageModel", "OpenAILanguageModel"
Partial match: "gemini", "openai", "ollama" (case-insensitive)

Passing Parameters to Providers

Parameters flow from lx.extract() to providers:

# Common parameters handled by lx.extract:
result = lx.extract(
    text="Your document",
    model_id="gemini-2.5-flash",
    prompt_description="Extract entities",
    examples=[...],
    num_workers=4,            # Parallel processing
    max_chunk_size=3000,      # Document chunking
    extraction_passes=3,      # Multiple passes for recall
)

# Provider-specific parameters via **kwargs:
result = lx.extract(
    text="Your document",
    model_id="gemini-2.5-flash",
    prompt_description="Extract entities",
    examples=[...],
    # These go directly to the provider:
    temperature=0.7,          # Sampling temperature
    api_key="your-key",      # Override environment variable
    max_output_tokens=1000,  # Token limit
)

Direct Provider Usage

from langextract.providers.gemini import GeminiLanguageModel

model = GeminiLanguageModel(
    model_id="gemini-2.5-flash",
    api_key="your-key"
)
outputs = model.infer(["prompt1", "prompt2"])

Plugin Discovery

External plugins are automatically discovered via Python entry points:

1. pip install langextract-yourprovider
   └── Installs package containing:
       • Provider class with @lx.providers.registry.register decorator
       • Python entry point pointing to this class

2. import langextract
   └── Loads providers/__init__.py
       └── Plugin loading is lazy (on-demand)

3. lx.extract(model_id="yourmodel-latest")
   └── Triggers plugin discovery via entry points
       └── @lx.providers.registry.register decorator fires
           └── Provider patterns added to registry
               └── Registry matches pattern and uses your provider

Plugin loading is lazy - plugins are discovered when first needed. To manually trigger plugin loading: lx.providers.load_plugins_once()

Set LANGEXTRACT_DISABLE_PLUGINS=1 to disable plugin loading if needed.

Environment Variables

The factory automatically resolves API keys from environment:

Provider	Environment Variables (in priority order)
Gemini	`GEMINI_API_KEY`, `LANGEXTRACT_API_KEY`
OpenAI	`OPENAI_API_KEY`, `LANGEXTRACT_API_KEY`
Ollama	`OLLAMA_BASE_URL` (default: http://localhost:11434)

Common Issues

Provider Not Found

ValueError: No provider registered for model_id='unknown-model'

Solution: Check available patterns with registry.list_entries()

Missing Dependencies

InferenceConfigError: OpenAI provider requires openai package

Solution: Install optional dependencies:

pip install langextract[openai]

Plugin Not Loading

Solutions:

Manually trigger loading: lx.providers.load_plugins_once()
Check entry points are installed: pip show -f your-package
Verify no typos in pyproject.toml entry point
Ensure package is installed: pip list | grep your-package

Next Steps

Gemini Provider

Use Google’s Gemini models with API key or Vertex AI

OpenAI Provider

Use OpenAI’s GPT models

Ollama Provider

Run local models without API keys

Custom Providers

Create your own provider plugins

Get Started

Core Concepts

Guides

Model Providers

Examples

Provider System Overview

Architecture

How Provider Selection Works

Provider Types

Core Providers (Always Available)

Built-in with Optional Dependencies

External Plugins (Third-party)

Usage Examples

Auto-Detection (Recommended)

Explicit Provider Selection

Passing Parameters to Providers

Direct Provider Usage

Plugin Discovery

Environment Variables

Common Issues

Provider Not Found

Missing Dependencies

Plugin Not Loading

Next Steps

Gemini Provider

OpenAI Provider

Ollama Provider

Custom Providers

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Model Providers

Examples

​Architecture

​How Provider Selection Works

​Provider Types

​Core Providers (Always Available)

​Built-in with Optional Dependencies

​External Plugins (Third-party)

​Usage Examples

​Auto-Detection (Recommended)

​Explicit Provider Selection

​Passing Parameters to Providers

​Direct Provider Usage

​Plugin Discovery

​Environment Variables

​Common Issues

​Provider Not Found

​Missing Dependencies

​Plugin Not Loading

​Next Steps

Gemini Provider

OpenAI Provider

Ollama Provider

Custom Providers

Build docs developers (and LLMs) love

Architecture

How Provider Selection Works

Provider Types

Core Providers (Always Available)

Built-in with Optional Dependencies

External Plugins (Third-party)

Usage Examples

Auto-Detection (Recommended)

Explicit Provider Selection

Passing Parameters to Providers

Direct Provider Usage

Plugin Discovery

Environment Variables

Common Issues

Provider Not Found

Missing Dependencies

Plugin Not Loading

Next Steps