Skip to main content
The Extraction class encapsulates information extracted from text, including its characteristics and position within the source text. It can represent diverse information for NLP information extraction tasks.

Constructor

Extraction(
    extraction_class: str,
    extraction_text: str,
    *,
    token_interval: TokenInterval | None = None,
    char_interval: CharInterval | None = None,
    alignment_status: AlignmentStatus | None = None,
    extraction_index: int | None = None,
    group_index: int | None = None,
    description: str | None = None,
    attributes: dict[str, str | list[str]] | None = None
)
extraction_class
str
required
The class or category of the extraction (e.g., “PERSON”, “DATE”, “LOCATION”)
extraction_text
str
required
The actual text content of the extraction
token_interval
TokenInterval | None
The token-level position of the extraction in the tokenized text
char_interval
CharInterval | None
The character-level position of the extraction in the original text
alignment_status
AlignmentStatus | None
The alignment status indicating how well the extraction matches the source text
extraction_index
int | None
The index of this extraction in the list of all extractions
group_index
int | None
The index of the group this extraction belongs to
description
str | None
An optional description providing additional context about the extraction
attributes
dict[str, str | list[str]] | None
Additional attributes associated with the extraction as key-value pairs

Attributes

extraction_class
str
The class or category of the extraction
extraction_text
str
The actual text content of the extraction
char_interval
CharInterval | None
The character-level position of the extraction in the original text
alignment_status
AlignmentStatus | None
The alignment status indicating how well the extraction matches the source text
extraction_index
int | None
The index of this extraction in the list of all extractions
group_index
int | None
The index of the group this extraction belongs to
description
str | None
An optional description providing additional context about the extraction
attributes
dict[str, str | list[str]] | None
Additional attributes associated with the extraction as key-value pairs
token_interval
TokenInterval | None
The token-level position of the extraction in the tokenized text (property)

Example

from langextract.core.data import Extraction, CharInterval

# Create an extraction for a person's name
extraction = Extraction(
    extraction_class="PERSON",
    extraction_text="John Smith",
    char_interval=CharInterval(start_pos=0, end_pos=10),
    description="A person mentioned in the document",
    attributes={
        "role": "author",
        "confidence": "high"
    }
)

print(extraction.extraction_class)  # "PERSON"
print(extraction.extraction_text)   # "John Smith"
print(extraction.char_interval.start_pos)  # 0
  • CharInterval - Represents character positions (nested class with start_pos and end_pos)
  • AlignmentStatus - Enum for alignment quality (values: MATCH_EXACT, MATCH_FUZZY, MATCH_LESSER, MATCH_GREATER)
  • AnnotatedDocument - Contains multiple extractions

Build docs developers (and LLMs) love