Skip to main content

Overview

The provider registry (router) system allows LangExtract to automatically resolve model identifiers to the appropriate provider class. It supports lazy registration to avoid importing provider dependencies until they’re actually needed.

Module

from langextract.providers import router

Registration Functions

register_lazy()

Registers a provider lazily using a string import path.
def register_lazy(
    *patterns: str | re.Pattern[str],
    target: str,
    priority: int = 0
) -> None
*patterns
str | re.Pattern[str]
required
One or more regex patterns to match model IDs. Each pattern can be a string (compiled to regex) or a compiled regex pattern.
target
str
required
Import path in format "module.path:ClassName". The module is imported only when the provider is needed.
priority
int
default:"0"
Priority for resolution. Higher values win on pattern conflicts.
Example:
from langextract.providers import router

router.register_lazy(
    r"^gemini-.*",
    r"^models/gemini-.*",
    target="langextract.providers.gemini:GeminiLanguageModel",
    priority=10
)

register()

Decorator to register a provider class directly.
def register(
    *patterns: str | re.Pattern[str],
    priority: int = 0
) -> Callable[[type[TLanguageModel]], type[TLanguageModel]]
*patterns
str | re.Pattern[str]
required
One or more regex patterns to match model IDs.
priority
int
default:"0"
Priority for resolution. Higher values win on pattern conflicts.
return
Callable
Decorator function that registers the class and returns it unchanged.
Example:
from langextract.providers import router
from langextract.core.base_model import BaseLanguageModel

@router.register(r"^custom-model-.*", priority=20)
class CustomLanguageModel(BaseLanguageModel):
    def infer(self, batch_prompts, **kwargs):
        # Implementation
        pass

Resolution Functions

resolve()

Resolves a model ID to its provider class.
@functools.lru_cache(maxsize=128)
def resolve(model_id: str) -> type[base_model.BaseLanguageModel]
model_id
str
required
The model identifier to resolve (e.g., "gemini-pro", "gpt-4").
return
type[BaseLanguageModel]
The provider class that handles this model ID.
Raises: InferenceConfigError if no provider is registered for the model ID. Example:
from langextract.providers import router

provider_class = router.resolve("gemini-1.5-flash")
model = provider_class(model_id="gemini-1.5-flash", api_key="...")

resolve_provider()

Resolves a provider name to its provider class.
@functools.lru_cache(maxsize=128)
def resolve_provider(provider_name: str) -> type[base_model.BaseLanguageModel]
provider_name
str
required
The provider name (e.g., "gemini", "openai") or class name (e.g., "GeminiLanguageModel").
return
type[BaseLanguageModel]
The provider class.
Raises: InferenceConfigError if no provider matches the name. Example:
from langextract.providers import router

provider_class = router.resolve_provider("gemini")
model = provider_class(model_id="gemini-1.5-flash", api_key="...")

Utility Functions

list_providers()

Lists all registered providers with their patterns and priorities.
def list_providers() -> list[tuple[tuple[str, ...], int]]
return
list[tuple[tuple[str, ...], int]]
List of (patterns, priority) tuples for debugging.
Example:
from langextract.providers import router

for patterns, priority in router.list_providers():
    print(f"Priority {priority}: {patterns}")

clear()

Clears all registered providers. Mainly for testing.
def clear() -> None
This function clears the registry and cache. Use it in tests to ensure clean state between test runs.

Plugin System

LangExtract automatically discovers and loads provider plugins via entry points.

load_plugins_once()

Loads provider plugins from installed packages.
def load_plugins_once() -> None
This function:
  • Discovers plugins using the langextract.providers entry point group
  • Loads each plugin’s provider class
  • Registers patterns from the provider’s get_model_patterns() method
  • Is idempotent (multiple calls have no effect)
  • Can be disabled by setting LANGEXTRACT_DISABLE_PLUGINS=1

load_builtins_once()

Loads built-in providers (Gemini, OpenAI, Ollama).
def load_builtins_once() -> None
This function:
  • Registers built-in provider patterns
  • Is idempotent (multiple calls have no effect)
  • Uses lazy registration to avoid importing dependencies

Pattern Matching

The registry uses regex patterns to match model IDs:
  • Patterns are checked in priority order (highest first)
  • First matching pattern wins
  • Patterns can be strings (auto-compiled) or compiled regex objects
  • Use ^ and $ anchors for exact matches
Pattern Examples:
# Match any Gemini model
r"^gemini-.*"

# Match specific GPT models
r"^gpt-[34].*"

# Match exact model ID
r"^claude-3-opus-20240229$"

# Match Ollama format
r"^ollama:.*"

Priority System

When multiple patterns match the same model ID, priority determines which provider is used:
  • Higher priority wins (e.g., 20 beats 10)
  • Built-in providers use priority 10
  • Plugins default to priority 20
  • Custom registrations can use any priority
  • Same-priority conflicts are resolved by registration order
Example:
# Custom provider overrides built-in
router.register_lazy(
    r"^gemini-.*",
    target="my_package.providers:CustomGeminiModel",
    priority=30  # Higher than built-in priority of 10
)

Usage Example

from langextract.providers import router
from langextract.core.base_model import BaseLanguageModel

# Register a custom provider
@router.register(r"^mymodel-.*", priority=20)
class MyCustomModel(BaseLanguageModel):
    def __init__(self, model_id: str, **kwargs):
        super().__init__(**kwargs)
        self.model_id = model_id
    
    def infer(self, batch_prompts, **kwargs):
        # Implementation
        pass

# Resolve and instantiate
provider_class = router.resolve("mymodel-v1")
model = provider_class(model_id="mymodel-v1", api_key="...")

# List all registered providers
for patterns, priority in router.list_providers():
    print(f"Priority {priority}: {', '.join(patterns)}")

Notes

  • Resolution results are cached (LRU cache, maxsize=128)
  • Use clear() to reset the cache in tests
  • Lazy registration avoids importing dependencies until needed
  • Plugins are auto-discovered via entry points
  • Set LANGEXTRACT_DISABLE_PLUGINS=1 to disable plugin loading

Build docs developers (and LLMs) love