Extraction

The Extraction class encapsulates information extracted from text, including its characteristics and position within the source text. It can represent diverse information for NLP information extraction tasks.

Constructor

Extraction(
    extraction_class: str,
    extraction_text: str,
    *,
    token_interval: TokenInterval | None = None,
    char_interval: CharInterval | None = None,
    alignment_status: AlignmentStatus | None = None,
    extraction_index: int | None = None,
    group_index: int | None = None,
    description: str | None = None,
    attributes: dict[str, str | list[str]] | None = None
)

extraction_class

str

required

The class or category of the extraction (e.g., “PERSON”, “DATE”, “LOCATION”)

extraction_text

str

required

The actual text content of the extraction

token_interval

TokenInterval | None

The token-level position of the extraction in the tokenized text

char_interval

CharInterval | None

The character-level position of the extraction in the original text

alignment_status

AlignmentStatus | None

The alignment status indicating how well the extraction matches the source text

extraction_index

int | None

The index of this extraction in the list of all extractions

group_index

int | None

The index of the group this extraction belongs to

description

str | None

An optional description providing additional context about the extraction

attributes

dict[str, str | list[str]] | None

Additional attributes associated with the extraction as key-value pairs

Attributes

extraction_class

str

The class or category of the extraction

extraction_text

str

The actual text content of the extraction

char_interval

CharInterval | None

The character-level position of the extraction in the original text

alignment_status

AlignmentStatus | None

The alignment status indicating how well the extraction matches the source text

extraction_index

int | None

The index of this extraction in the list of all extractions

group_index

int | None

The index of the group this extraction belongs to

description

str | None

An optional description providing additional context about the extraction

attributes

dict[str, str | list[str]] | None

Additional attributes associated with the extraction as key-value pairs

token_interval

TokenInterval | None

The token-level position of the extraction in the tokenized text (property)

Example

from langextract.core.data import Extraction, CharInterval

# Create an extraction for a person's name
extraction = Extraction(
    extraction_class="PERSON",
    extraction_text="John Smith",
    char_interval=CharInterval(start_pos=0, end_pos=10),
    description="A person mentioned in the document",
    attributes={
        "role": "author",
        "confidence": "high"
    }
)

print(extraction.extraction_class)  # "PERSON"
print(extraction.extraction_text)   # "John Smith"
print(extraction.char_interval.start_pos)  # 0

CharInterval - Represents character positions (nested class with start_pos and end_pos)
AlignmentStatus - Enum for alignment quality (values: MATCH_EXACT, MATCH_FUZZY, MATCH_LESSER, MATCH_GREATER)
AnnotatedDocument - Contains multiple extractions

Core API

Data Classes

I/O Operations

Factory & Configuration

Provider API

Advanced

Constructor

Attributes

Example

Build docs developers (and LLMs) love

Core API

Data Classes

I/O Operations

Factory & Configuration

Provider API

Advanced

​Constructor

​Attributes

​Example

​Related Classes

Build docs developers (and LLMs) love

Constructor

Attributes

Example

Related Classes