Introduction

Auto-generate your docs

What is LangExtract?
Why LangExtract?
Getting Started
Key Capabilities
Leverages LLM World Knowledge
Custom Model Providers
Example Use Cases
Next Steps

What is LangExtract?

LangExtract is a Python library that uses LLMs to extract structured information from unstructured text documents based on user-defined instructions. It processes materials such as clinical notes or reports, identifying and organizing key details while ensuring the extracted data corresponds to the source text.

Why LangExtract?

Precise Source Grounding

Maps every extraction to its exact location in the source text, enabling visual highlighting for easy traceability and verification.

Reliable Structured Outputs

Enforces a consistent output schema based on your few-shot examples, leveraging controlled generation in supported models like Gemini.

Optimized for Long Documents

Overcomes the “needle-in-a-haystack” challenge through text chunking, parallel processing, and multiple passes for higher recall.

Interactive Visualization

Instantly generates a self-contained, interactive HTML file to visualize and review thousands of extracted entities in context.

Flexible LLM Support

Supports cloud-based LLMs like Google Gemini family and local open-source models via the built-in Ollama interface.

Adaptable to Any Domain

Define extraction tasks for any domain using just a few examples. No model fine-tuning required.

Getting Started

Installation

Install LangExtract via pip, from source, or with Docker

Quick Start

Extract your first structured data in minutes

API Reference

Explore the complete API documentation

Examples

View real-world extraction examples

Key Capabilities

Leverages LLM World Knowledge

Utilize precise prompt wording and few-shot examples to influence how the extraction task may utilize LLM knowledge. The accuracy of any inferred information and its adherence to the task specification are contingent upon the selected LLM, the complexity of the task, the clarity of the prompt instructions, and the nature of the prompt examples.

Custom Model Providers

LangExtract supports custom LLM providers via a lightweight plugin system:

Add new model support independently of the core library
Distribute your provider as a separate Python package
Keep custom dependencies isolated
Override or extend built-in providers via priority-based resolution

Example Use Cases

Healthcare: Extract medications, diagnoses, and treatment plans from clinical notes
Legal: Identify key clauses, entities, and obligations from contracts
Research: Structure information from academic papers and reports
Literature: Analyze characters, emotions, and relationships in texts
Business: Extract structured data from unstructured documents and emails

Next Steps

Ready to start extracting? Head to the Installation guide to set up LangExtract, or jump straight to the Quick Start to see it in action.

This is not an officially supported Google product. LangExtract is licensed under Apache 2.0.

Installation

⌘I

Build docs developers (and LLMs) love

Get started for free Talk to us

Get Started

Core Concepts

Guides

Model Providers

Examples

What is LangExtract?

Why LangExtract?

Precise Source Grounding

Reliable Structured Outputs

Optimized for Long Documents

Interactive Visualization

Flexible LLM Support

Adaptable to Any Domain

Getting Started

Installation

Quick Start

API Reference

Examples

Key Capabilities

Leverages LLM World Knowledge

Custom Model Providers

Example Use Cases

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Model Providers

Examples

​What is LangExtract?

​Why LangExtract?

Precise Source Grounding

Reliable Structured Outputs

Optimized for Long Documents

Interactive Visualization

Flexible LLM Support

Adaptable to Any Domain

​Getting Started

Installation

Quick Start

API Reference

Examples

​Key Capabilities

​Leverages LLM World Knowledge

​Custom Model Providers

​Example Use Cases

​Next Steps

Build docs developers (and LLMs) love

What is LangExtract?

Why LangExtract?

Getting Started

Key Capabilities

Leverages LLM World Knowledge

Custom Model Providers

Example Use Cases

Next Steps