Visualization

Overview

LangExtract provides built-in visualization to generate interactive HTML files that display extracted entities highlighted in their original context. This feature makes it easy to review and verify thousands of extractions.

Basic Usage

From an AnnotatedDocument

Visualize directly from an extraction result:

import langextract as lx

# Perform extraction
result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash"
)

# Generate visualization
html_content = lx.visualize(result)

# Save to file
with open("visualization.html", "w") as f:
    if hasattr(html_content, 'data'):
        f.write(html_content.data)  # For Jupyter/Colab
    else:
        f.write(html_content)

From a JSONL File

Visualize from previously saved results:

import langextract as lx

# Save extraction results
lx.io.save_annotated_documents([result], output_name="extraction_results.jsonl", output_dir=".")

# Generate visualization from the file
html_content = lx.visualize("extraction_results.jsonl")
with open("visualization.html", "w") as f:
    if hasattr(html_content, 'data'):
        f.write(html_content.data)  # For Jupyter/Colab
    else:
        f.write(html_content)

The visualization automatically handles both Jupyter/Colab environments (returns IPython.display.HTML) and standard Python scripts (returns HTML string).

Visualization Parameters

Customize the visualization with these parameters:

animation_speed

Control the speed of the animation when playing through extractions:

html_content = lx.visualize(
    "extraction_results.jsonl",
    animation_speed=1.5  # Default: 1.0 (seconds between extractions)
)

show_legend

Toggle the display of the color legend:

html_content = lx.visualize(
    "extraction_results.jsonl",
    show_legend=True  # Default: True
)

The legend displays extraction classes with their assigned colors for easy reference.

gif_optimized

Enable optimizations for video capture and GIF creation:

html_content = lx.visualize(
    "extraction_results.jsonl",
    gif_optimized=True  # Default: True
)

Optimizations include:

Larger fonts for better readability (16px text, 15px attributes)
Better contrast and improved dimensions
Thicker highlighting for visibility
Enhanced line spacing (1.8)

Features

Interactive Controls

The generated HTML includes interactive controls:

Play/Pause: Auto-advance through extractions
Previous/Next: Navigate between entities
Progress Slider: Jump to specific extractions
Status Display: Shows current entity number and position

Entity Highlighting

Each extraction class is assigned a unique color
Current entity is highlighted with a bold red underline
Hovering over entities shows tooltips (when nested)
Smooth scrolling keeps entities centered

Attributes Panel

Displays for each extraction:

Extraction class: The category of the entity
Attributes: All key-value pairs in a readable format
Context window: Shows text before and after the extraction

Complete Example

import langextract as lx
import textwrap

# Define extraction task
prompt = textwrap.dedent("""\
    Extract characters, emotions, and relationships in order of appearance.
    Use exact text for extractions. Do not paraphrase or overlap entities.
    Provide meaningful attributes for each entity to add context.""")

examples = [
    lx.data.ExampleData(
        text="ROMEO. But soft! What light through yonder window breaks?",
        extractions=[
            lx.data.Extraction(
                extraction_class="character",
                extraction_text="ROMEO",
                attributes={"emotional_state": "wonder"}
            ),
            lx.data.Extraction(
                extraction_class="emotion",
                extraction_text="But soft!",
                attributes={"feeling": "gentle awe"}
            ),
        ]
    )
]

# Perform extraction
result = lx.extract(
    text_or_documents="Lady Juliet gazed longingly at the stars, her heart aching for Romeo",
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash"
)

# Save results
lx.io.save_annotated_documents(
    [result],
    output_name="extraction_results.jsonl",
    output_dir="."
)

# Generate visualization
html_content = lx.visualize(
    "extraction_results.jsonl",
    animation_speed=1.0,
    show_legend=True,
    gif_optimized=True
)

# Save to HTML file
with open("visualization.html", "w") as f:
    if hasattr(html_content, 'data'):
        f.write(html_content.data)
    else:
        f.write(html_content)

Handling Large Result Sets

The visualization seamlessly handles large result sets:

Efficiently renders hundreds or thousands of entities
Smooth scrolling and navigation
Optimized for performance with large documents
Progress slider for quick navigation

# Visualize large extraction from full novel
result = lx.extract(
    text_or_documents="https://www.gutenberg.org/files/1513/1513-0.txt",
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    extraction_passes=3,
    max_workers=20,
    max_char_buffer=1000
)

lx.io.save_annotated_documents([result], output_name="romeo_juliet.jsonl", output_dir=".")
html_content = lx.visualize("romeo_juliet.jsonl")

# Save the visualization
with open("romeo_juliet_viz.html", "w") as f:
    if hasattr(html_content, 'data'):
        f.write(html_content.data)
    else:
        f.write(html_content)

Color Palette

The visualization uses a carefully selected color palette optimized for readability:

Light Blue, Light Green, Light Yellow
Light Red, Light Orange, Light Purple
Light Teal, Light Pink, Very Light Grey
Pale Cyan

Colors are automatically assigned to extraction classes in a consistent manner.

Nested Extractions

The visualization properly handles overlapping and nested extractions:

Shorter spans close before longer spans open
Proper HTML nesting ensures correct display
Tooltips show attributes when hovering over nested entities

Next Steps

Learn how to process long documents and visualize large result sets
Explore basic extraction to understand the extraction process
Configure different model providers for your needs

Get Started

Core Concepts

Guides

Model Providers

Examples

Overview

Basic Usage

From an AnnotatedDocument

From a JSONL File

Visualization Parameters

animation_speed

show_legend

gif_optimized

Features

Interactive Controls

Entity Highlighting

Attributes Panel

Complete Example

Handling Large Result Sets

Color Palette

Nested Extractions

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Model Providers

Examples

​Overview

​Basic Usage

​From an AnnotatedDocument

​From a JSONL File

​Visualization Parameters

​animation_speed

​show_legend

​gif_optimized

​Features

​Interactive Controls

​Entity Highlighting

​Attributes Panel

​Complete Example

​Handling Large Result Sets

​Color Palette

​Nested Extractions

​Next Steps

Build docs developers (and LLMs) love

Overview

Basic Usage

From an AnnotatedDocument

From a JSONL File

Visualization Parameters

animation_speed

show_legend

gif_optimized

Features

Interactive Controls

Entity Highlighting

Attributes Panel

Complete Example

Handling Large Result Sets

Color Palette

Nested Extractions

Next Steps