Resolver

Overview

The resolver module provides functionality for parsing language model outputs into structured extractions and aligning them with the source text. It handles JSON/YAML parsing, fuzzy matching, and extraction alignment with token and character positions.

Module

from langextract import resolver

Classes

AbstractResolver

Abstract base class for resolvers.

class AbstractResolver(abc.ABC):
    def __init__(
        self,
        fence_output: bool = True,
        constraint: schema.Constraint = schema.Constraint(),
        format_type: data.FormatType = data.FormatType.JSON
    )

fence_output

bool

default:"True"

Whether to expect fenced output (json or yaml). When True, the resolver expects code fences. When False, raw JSON/YAML is expected.

constraint

Constraint

default:"Constraint()"

Applies constraint when decoding the output.

format_type

FormatType

default:"FormatType.JSON"

The format type for the output (JSON or YAML).

Abstract Methods:

resolve(): Parse input text into extractions
align(): Align extractions with source text positions

Concrete resolver implementation for YAML/JSON-based extraction.

class Resolver(AbstractResolver):
    def __init__(
        self,
        format_handler: FormatHandler | None = None,
        extraction_index_suffix: str | None = None,
        **kwargs
    )

format_handler

FormatHandler | None

default:"None"

The format handler that knows how to parse output. If None, a default handler is created from kwargs.

extraction_index_suffix

str | None

default:"None"

Suffix identifying index keys that determine the ordering of extractions. For example, "_index" will sort by fields like "entity_index". If None, extractions are returned in appearance order.

**kwargs

Any

Legacy parameters (fence_output, format_type, etc.) for backward compatibility. These create a FormatHandler if one is not provided.

Methods

resolve()

Parses LLM output text into structured extractions.

def resolve(
    self,
    input_text: str,
    suppress_parse_errors: bool = False,
    **kwargs
) -> Sequence[data.Extraction]

input_text

str

required

The input text to be processed (LLM output).

suppress_parse_errors

bool

default:"False"

If True, log errors and return empty list instead of raising exceptions.

**kwargs

Any

Additional keyword arguments.

return

Sequence[Extraction]

Sequence of Extraction objects parsed from the input.

Raises: ResolverParsingError if the content cannot be parsed (unless suppress_parse_errors=True).

align()

Aligns extractions with source text, setting token/char intervals and alignment status.

def align(
    self,
    extractions: Sequence[data.Extraction],
    source_text: str,
    token_offset: int,
    char_offset: int | None = None,
    enable_fuzzy_alignment: bool = True,
    fuzzy_alignment_threshold: float = 0.75,
    accept_match_lesser: bool = True,
    tokenizer_inst: Tokenizer | None = None,
    **kwargs
) -> Iterator[data.Extraction]

extractions

Sequence[Extraction]

required

Annotated extractions to align with the source text.

source_text

str

required

The text in which to align the extractions.

token_offset

int

required

The token offset corresponding to the starting token index of the chunk.

char_offset

int | None

default:"None"

The char offset corresponding to the starting character index of the chunk.

enable_fuzzy_alignment

bool

default:"True"

Whether to use fuzzy alignment when exact matching fails.

fuzzy_alignment_threshold

float

default:"0.75"

Minimum token overlap ratio for fuzzy alignment (0-1).

accept_match_lesser

bool

default:"True"

Whether to accept partial exact matches (MATCH_LESSER status).

tokenizer_inst

Tokenizer | None

default:"None"

Optional tokenizer instance.

**kwargs

Any

Additional parameters.

return

Iterator[Extraction]

Iterator yielding aligned extractions with updated intervals and alignment status.

Alignment Status Values:

MATCH_EXACT: Perfect token-level match
MATCH_LESSER: Partial exact match (extraction longer than matched text)
MATCH_FUZZY: Best overlap window meets threshold (≥ fuzzy_alignment_threshold)
None: No alignment found

extract_ordered_extractions()

Extracts and orders extraction data based on associated indexes.

def extract_ordered_extractions(
    self,
    extraction_data: Sequence[Mapping[str, ExtractionValueType]]
) -> Sequence[data.Extraction]

extraction_data

Sequence[Mapping[str, ExtractionValueType]]

required

A list of dictionaries containing extraction class keys and their values, along with optional index keys.

return

Sequence[Extraction]

Extractions sorted by the index attribute or by order of appearance.

Raises: ValueError if extraction text is not a string/integer/float, or if index is not an integer.

Helper Classes

WordAligner

Aligns words between two sequences of tokens using Python’s difflib.

class WordAligner:
    def align_extractions(
        self,
        extraction_groups: Sequence[Sequence[data.Extraction]],
        source_text: str,
        token_offset: int = 0,
        char_offset: int = 0,
        delim: str = "\u241F",
        enable_fuzzy_alignment: bool = True,
        fuzzy_alignment_threshold: float = 0.75,
        accept_match_lesser: bool = True,
        tokenizer_impl: Tokenizer | None = None
    ) -> Sequence[Sequence[data.Extraction]]

extraction_groups

Sequence[Sequence[Extraction]]

required

A sequence of sequences, where each inner sequence contains Extraction objects.

source_text

str

required

The source text against which extractions are aligned.

token_offset

int

default:"0"

Offset to add to token interval indices.

char_offset

int

default:"0"

Offset to add to character interval positions.

delim

str

default:"\\u241F"

Token used to separate multi-token extractions (Unicode unit separator).

enable_fuzzy_alignment

bool

default:"True"

Whether to use fuzzy alignment when exact matching fails.

fuzzy_alignment_threshold

float

default:"0.75"

Minimum token overlap ratio for fuzzy alignment.

accept_match_lesser

bool

default:"True"

Whether to accept partial exact matches.

tokenizer_impl

Tokenizer | None

default:"None"

Optional tokenizer instance.

return

Sequence[Sequence[Extraction]]

Sequence of extractions aligned with the source text, including token intervals.

Usage Examples

Basic Resolve and Align

from langextract.resolver import Resolver
from langextract.core.data import FormatType

# Create resolver
resolver = Resolver(format_type=FormatType.YAML)

# Parse LLM output
llm_output = """
extractions:
  - person: John Smith
    person_index: 1
  - organization: Acme Corp
    organization_index: 2
"""

extractions = resolver.resolve(llm_output)

for extraction in extractions:
    print(f"{extraction.extraction_class}: {extraction.extraction_text}")

# Align with source text
source_text = "John Smith founded Acme Corp in 2020."
aligned = resolver.align(
    extractions,
    source_text,
    token_offset=0,
    char_offset=0
)

for extraction in aligned:
    if extraction.char_interval:
        start = extraction.char_interval.start_pos
        end = extraction.char_interval.end_pos
        print(f"{extraction.extraction_class}: '{source_text[start:end]}'")
        print(f"  Position: {start}-{end}")
        print(f"  Alignment: {extraction.alignment_status}")

Handling Parse Errors

from langextract.resolver import Resolver
from langextract.core.data import FormatType

resolver = Resolver(format_type=FormatType.JSON)

# Invalid JSON
invalid_output = "{'invalid': json}"

# Suppress errors and continue
extractions = resolver.resolve(
    invalid_output,
    suppress_parse_errors=True
)
print(f"Extracted {len(extractions)} items")  # Returns empty list

# Or catch exceptions
try:
    extractions = resolver.resolve(invalid_output)
except resolver.ResolverParsingError as e:
    print(f"Parse error: {e}")

Fuzzy Alignment

from langextract.resolver import Resolver
from langextract.core.data import Extraction, FormatType

resolver = Resolver(format_type=FormatType.YAML)

# Extraction text doesn't exactly match source
extractions = [
    Extraction(
        extraction_class="person",
        extraction_text="Sarah Johnson",  # Missing "Dr."
        extraction_index=1
    )
]

source_text = "Dr. Sarah Johnson is the lead researcher."

# Fuzzy alignment will find best match
aligned = list(resolver.align(
    extractions,
    source_text,
    token_offset=0,
    char_offset=0,
    enable_fuzzy_alignment=True,
    fuzzy_alignment_threshold=0.8
))

for extraction in aligned:
    print(f"Status: {extraction.alignment_status}")
    if extraction.char_interval:
        start = extraction.char_interval.start_pos
        end = extraction.char_interval.end_pos
        print(f"Matched: '{source_text[start:end]}'")

Custom Index Suffix

from langextract.resolver import Resolver
from langextract.core.format_handler import FormatHandler
from langextract.core.data import FormatType

# Use custom index suffix for ordering
format_handler = FormatHandler(
    format_type=FormatType.JSON,
    use_wrapper=True,
    wrapper_key="extractions"
)

resolver = Resolver(
    format_handler=format_handler,
    extraction_index_suffix="_order"  # Use _order instead of _index
)

llm_output = '''
{
  "extractions": [
    {"entity": "second", "entity_order": 2},
    {"entity": "first", "entity_order": 1}
  ]
}
'''

extractions = resolver.resolve(llm_output)

for extraction in extractions:
    print(f"{extraction.extraction_index}: {extraction.extraction_text}")
# Output:
# 1: first
# 2: second

Disable Fuzzy Alignment

from langextract.resolver import Resolver
from langextract.core.data import Extraction, FormatType

resolver = Resolver(format_type=FormatType.YAML)

extractions = [
    Extraction(
        extraction_class="entity",
        extraction_text="inexact match",
        extraction_index=1
    )
]

source_text = "This is an exact match."

# Only exact matches will be aligned
aligned = list(resolver.align(
    extractions,
    source_text,
    token_offset=0,
    enable_fuzzy_alignment=False
))

for extraction in aligned:
    if extraction.alignment_status is None:
        print("No alignment found (fuzzy disabled)")

Working with Attributes

from langextract.resolver import Resolver
from langextract.core.format_handler import FormatHandler
from langextract.core.data import FormatType

format_handler = FormatHandler(
    format_type=FormatType.YAML,
    attribute_suffix="_attrs"
)

resolver = Resolver(format_handler=format_handler)

llm_output = """
extractions:
  - person: Dr. Smith
    person_index: 1
    person_attrs:
      title: Doctor
      specialty: Cardiology
"""

extractions = resolver.resolve(llm_output)

for extraction in extractions:
    print(f"{extraction.extraction_class}: {extraction.extraction_text}")
    if extraction.attributes:
        print(f"  Attributes: {extraction.attributes}")

Token Offset Example

from langextract.resolver import Resolver
from langextract.core.data import Extraction, FormatType

resolver = Resolver(format_type=FormatType.YAML)

# This is chunk 2 of a larger document
chunk_text = "is the lead researcher."
token_offset = 5  # Chunk starts at token 5 in full document
char_offset = 15  # Chunk starts at char 15 in full document

extractions = [
    Extraction(
        extraction_class="role",
        extraction_text="lead researcher",
        extraction_index=1
    )
]

aligned = list(resolver.align(
    extractions,
    chunk_text,
    token_offset=token_offset,
    char_offset=char_offset
))

for extraction in aligned:
    # Token and char intervals are relative to full document
    print(f"Token interval: {extraction.token_interval}")
    print(f"Char interval: {extraction.char_interval}")

Notes

The resolver uses difflib’s SequenceMatcher for exact alignment
Fuzzy alignment scans all candidate windows for best overlap ratio
Token normalization applies light stemming (removes trailing ‘s’) to improve matching
Use extraction_index_suffix to control extraction ordering
Set suppress_parse_errors=True to continue processing despite parse failures
Alignment status helps identify extraction quality for filtering or validation
WordAligner uses a delimiter (Unicode unit separator) to separate extractions
Character and token offsets allow mapping chunks back to original document positions

Core API

Data Classes

I/O Operations

Factory & Configuration

Provider API

Advanced

Resolver

Overview

Module

Classes

AbstractResolver

Resolver

Methods

resolve()

align()

extract_ordered_extractions()

Helper Classes

WordAligner

Usage Examples

Basic Resolve and Align

Handling Parse Errors

Fuzzy Alignment

Custom Index Suffix

Disable Fuzzy Alignment

Working with Attributes

Token Offset Example

Notes

Build docs developers (and LLMs) love

Core API

Data Classes

I/O Operations

Factory & Configuration

Provider API

Advanced

​Overview

​Module

​Classes

​AbstractResolver

​Resolver

​Methods

​resolve()

​align()

​extract_ordered_extractions()

​Helper Classes

​WordAligner

​Usage Examples

​Basic Resolve and Align

​Handling Parse Errors

​Fuzzy Alignment

​Custom Index Suffix

​Disable Fuzzy Alignment

​Working with Attributes

​Token Offset Example

​Notes

Build docs developers (and LLMs) love

Overview

Module

Classes

AbstractResolver

Resolver

Methods

resolve()

align()

extract_ordered_extractions()

Helper Classes

WordAligner

Usage Examples

Basic Resolve and Align

Handling Parse Errors

Fuzzy Alignment

Custom Index Suffix

Disable Fuzzy Alignment

Working with Attributes

Token Offset Example

Notes