BaseLanguageModel

Overview

BaseLanguageModel is an abstract base class that defines the interface for all language model providers in LangExtract. All provider implementations (Gemini, OpenAI, Ollama, etc.) must inherit from this class and implement its abstract methods.

Class Definition

from langextract.core.base_model import BaseLanguageModel

Constructor

constraint

types.Constraint | None

default:"None"

Applies constraints when decoding the output. Defaults to no constraint.

**kwargs

Any

Additional keyword arguments passed to the model at initialization. These are stored and merged with runtime kwargs during inference.

Methods

infer()

Performs language model inference on a batch of prompts.

def infer(
    self,
    batch_prompts: Sequence[str],
    **kwargs
) -> Iterator[Sequence[types.ScoredOutput]]

batch_prompts

Sequence[str]

required

Batch of input prompts for inference. Single element list can be used for a single input.

**kwargs

Any

Additional arguments for inference, like temperature and max_decode_steps.

return

Iterator[Sequence[types.ScoredOutput]]

Iterator yielding batches of probable output texts, sorted by descending score.

infer_batch()

Convenience method for batch inference with configurable batch size.

def infer_batch(
    self,
    prompts: Sequence[str],
    batch_size: int = 32
) -> list[list[types.ScoredOutput]]

prompts

Sequence[str]

required

List of prompts to process.

batch_size

int

default:"32"

Batch size for processing (currently unused, reserved for future optimization).

return

list[list[types.ScoredOutput]]

List of lists of ScoredOutput objects, one per input prompt.

parse_output()

Parses raw model output as JSON or YAML.

def parse_output(self, output: str) -> Any

output

str

required

Raw output string from the model (without code fences).

return

Any

Parsed Python object (dict or list).

Raises: ValueError if output cannot be parsed as JSON or YAML.

apply_schema()

Applies a schema instance to the provider for structured output.

def apply_schema(self, schema_instance: schema.BaseSchema | None) -> None

schema_instance

schema.BaseSchema | None

required

The schema instance to apply, or None to clear.

set_fence_output()

Sets explicit fence output preference for code block formatting.

def set_fence_output(self, fence_output: bool | None) -> None

fence_output

bool | None

required

True: Force code fences (json or yaml)
False: Disable fences (raw JSON/YAML)
None: Auto-detect based on schema

merge_kwargs()

Merges stored initialization kwargs with runtime kwargs.

def merge_kwargs(
    self,
    runtime_kwargs: Mapping[str, Any] | None = None
) -> dict[str, Any]

runtime_kwargs

Mapping[str, Any] | None

default:"None"

Kwargs provided at inference time. These take precedence over stored kwargs.

return

dict[str, Any]

Merged kwargs dictionary.

Class Methods

get_schema_class()

Returns the schema class this provider supports (e.g., Pydantic schemas).

@classmethod
def get_schema_class(cls) -> type[Any] | None

return

type[Any] | None

The schema class, or None if provider doesn’t support schemas.

Properties

schema

@property
def schema(self) -> schema.BaseSchema | None

The current schema instance if one is configured, otherwise None.

requires_fence_output

@property
def requires_fence_output(self) -> bool

Whether this model requires fence output for parsing. Uses explicit override if set via set_fence_output(), otherwise computes from schema.

Usage Example

from langextract.core.base_model import BaseLanguageModel
from langextract.core import types

class CustomLanguageModel(BaseLanguageModel):
    def __init__(self, api_key: str, **kwargs):
        super().__init__(**kwargs)
        self.api_key = api_key
    
    def infer(
        self,
        batch_prompts: Sequence[str],
        **kwargs
    ) -> Iterator[Sequence[types.ScoredOutput]]:
        # Implementation for calling your model API
        for prompt in batch_prompts:
            response = self._call_api(prompt, **kwargs)
            yield [types.ScoredOutput(
                output=response.text,
                score=response.confidence
            )]
    
    def _call_api(self, prompt: str, **kwargs):
        # Your API call implementation
        pass

# Using the custom model
model = CustomLanguageModel(
    api_key="your-key",
    temperature=0.7
)

results = model.infer_batch(["Extract entities from this text"])

Notes

All provider implementations must implement the abstract infer() method
The parse_output() method expects raw JSON/YAML without code fences; fence extraction is handled by the resolver
Use merge_kwargs() to combine initialization and runtime parameters
Schema support is optional but recommended for structured output

Core API

Data Classes

I/O Operations

Factory & Configuration

Provider API

Advanced

Overview

Class Definition

Constructor

Methods

infer()

infer_batch()

parse_output()

apply_schema()

set_fence_output()

merge_kwargs()

Class Methods

get_schema_class()

Properties

schema

requires_fence_output

Usage Example

Notes

Build docs developers (and LLMs) love

Core API

Data Classes

I/O Operations

Factory & Configuration

Provider API

Advanced

​Overview

​Class Definition

​Constructor

​Methods

​infer()

​infer_batch()

​parse_output()

​apply_schema()

​set_fence_output()

​merge_kwargs()

​Class Methods

​get_schema_class()

​Properties

​schema

​requires_fence_output

​Usage Example

​Notes

Build docs developers (and LLMs) love

Overview

Class Definition

Constructor

Methods

infer()

infer_batch()

parse_output()

apply_schema()

set_fence_output()

merge_kwargs()

Class Methods

get_schema_class()

Properties

schema

requires_fence_output

Usage Example

Notes