An extraction task in LangExtract defines what you want to extract from unstructured text and how the extraction should be performed. Tasks are specified through two key components:
Prompt Description: Natural language instructions explaining the extraction goals
Few-Shot Examples: High-quality demonstrations showing the expected extraction format and behavior
Extraction tasks are domain-agnostic. You can define tasks for any domain—from literary analysis to medical records—without requiring model fine-tuning.
Here’s a complete extraction task for identifying characters and emotions in literary text:
import langextract as lximport textwrap# 1. Write clear instructionsprompt = textwrap.dedent("""\ Extract characters, emotions, and relationships in order of appearance. Use exact text for extractions. Do not paraphrase or overlap entities. Provide meaningful attributes for each entity to add context.""")# 2. Provide high-quality examplesexamples = [ lx.data.ExampleData( text="ROMEO. But soft! What light through yonder window breaks?", extractions=[ lx.data.Extraction( extraction_class="character", extraction_text="ROMEO", attributes={"emotional_state": "wonder"} ), lx.data.Extraction( extraction_class="emotion", extraction_text="But soft!", attributes={"feeling": "gentle awe"} ), ] )]
Few-shot learning allows you to guide the LLM’s extraction behavior using examples rather than extensive training data. The examples you provide serve as a template for the model to follow.
For richer context, allow the LLM to use its knowledge:
lx.data.Extraction( extraction_class="character", extraction_text="Lady Juliet", attributes={ "identity": "Capulet family daughter", # From LLM knowledge "literary_context": "tragic heroine" # From LLM knowledge })
The balance between text-evidence and knowledge-inference is controlled by your prompt instructions and example attributes. Be explicit about what level of inference you expect.