Skip to main content

Overview

LangExtract provides built-in visualization to generate interactive HTML files that display extracted entities highlighted in their original context. This feature makes it easy to review and verify thousands of extractions.

Basic Usage

From an AnnotatedDocument

Visualize directly from an extraction result:
import langextract as lx

# Perform extraction
result = lx.extract(
    text_or_documents=input_text,
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash"
)

# Generate visualization
html_content = lx.visualize(result)

# Save to file
with open("visualization.html", "w") as f:
    if hasattr(html_content, 'data'):
        f.write(html_content.data)  # For Jupyter/Colab
    else:
        f.write(html_content)

From a JSONL File

Visualize from previously saved results:
import langextract as lx

# Save extraction results
lx.io.save_annotated_documents([result], output_name="extraction_results.jsonl", output_dir=".")

# Generate visualization from the file
html_content = lx.visualize("extraction_results.jsonl")
with open("visualization.html", "w") as f:
    if hasattr(html_content, 'data'):
        f.write(html_content.data)  # For Jupyter/Colab
    else:
        f.write(html_content)
The visualization automatically handles both Jupyter/Colab environments (returns IPython.display.HTML) and standard Python scripts (returns HTML string).

Visualization Parameters

Customize the visualization with these parameters:

animation_speed

Control the speed of the animation when playing through extractions:
html_content = lx.visualize(
    "extraction_results.jsonl",
    animation_speed=1.5  # Default: 1.0 (seconds between extractions)
)

show_legend

Toggle the display of the color legend:
html_content = lx.visualize(
    "extraction_results.jsonl",
    show_legend=True  # Default: True
)
The legend displays extraction classes with their assigned colors for easy reference.

gif_optimized

Enable optimizations for video capture and GIF creation:
html_content = lx.visualize(
    "extraction_results.jsonl",
    gif_optimized=True  # Default: True
)
Optimizations include:
  • Larger fonts for better readability (16px text, 15px attributes)
  • Better contrast and improved dimensions
  • Thicker highlighting for visibility
  • Enhanced line spacing (1.8)

Features

Interactive Controls

The generated HTML includes interactive controls:
  • Play/Pause: Auto-advance through extractions
  • Previous/Next: Navigate between entities
  • Progress Slider: Jump to specific extractions
  • Status Display: Shows current entity number and position

Entity Highlighting

  • Each extraction class is assigned a unique color
  • Current entity is highlighted with a bold red underline
  • Hovering over entities shows tooltips (when nested)
  • Smooth scrolling keeps entities centered

Attributes Panel

Displays for each extraction:
  • Extraction class: The category of the entity
  • Attributes: All key-value pairs in a readable format
  • Context window: Shows text before and after the extraction

Complete Example

import langextract as lx
import textwrap

# Define extraction task
prompt = textwrap.dedent("""\
    Extract characters, emotions, and relationships in order of appearance.
    Use exact text for extractions. Do not paraphrase or overlap entities.
    Provide meaningful attributes for each entity to add context.""")

examples = [
    lx.data.ExampleData(
        text="ROMEO. But soft! What light through yonder window breaks?",
        extractions=[
            lx.data.Extraction(
                extraction_class="character",
                extraction_text="ROMEO",
                attributes={"emotional_state": "wonder"}
            ),
            lx.data.Extraction(
                extraction_class="emotion",
                extraction_text="But soft!",
                attributes={"feeling": "gentle awe"}
            ),
        ]
    )
]

# Perform extraction
result = lx.extract(
    text_or_documents="Lady Juliet gazed longingly at the stars, her heart aching for Romeo",
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash"
)

# Save results
lx.io.save_annotated_documents(
    [result],
    output_name="extraction_results.jsonl",
    output_dir="."
)

# Generate visualization
html_content = lx.visualize(
    "extraction_results.jsonl",
    animation_speed=1.0,
    show_legend=True,
    gif_optimized=True
)

# Save to HTML file
with open("visualization.html", "w") as f:
    if hasattr(html_content, 'data'):
        f.write(html_content.data)
    else:
        f.write(html_content)

Handling Large Result Sets

The visualization seamlessly handles large result sets:
  • Efficiently renders hundreds or thousands of entities
  • Smooth scrolling and navigation
  • Optimized for performance with large documents
  • Progress slider for quick navigation
# Visualize large extraction from full novel
result = lx.extract(
    text_or_documents="https://www.gutenberg.org/files/1513/1513-0.txt",
    prompt_description=prompt,
    examples=examples,
    model_id="gemini-2.5-flash",
    extraction_passes=3,
    max_workers=20,
    max_char_buffer=1000
)

lx.io.save_annotated_documents([result], output_name="romeo_juliet.jsonl", output_dir=".")
html_content = lx.visualize("romeo_juliet.jsonl")

# Save the visualization
with open("romeo_juliet_viz.html", "w") as f:
    if hasattr(html_content, 'data'):
        f.write(html_content.data)
    else:
        f.write(html_content)

Color Palette

The visualization uses a carefully selected color palette optimized for readability:
  • Light Blue, Light Green, Light Yellow
  • Light Red, Light Orange, Light Purple
  • Light Teal, Light Pink, Very Light Grey
  • Pale Cyan
Colors are automatically assigned to extraction classes in a consistent manner.

Nested Extractions

The visualization properly handles overlapping and nested extractions:
  • Shorter spans close before longer spans open
  • Proper HTML nesting ensures correct display
  • Tooltips show attributes when hovering over nested entities

Next Steps

Build docs developers (and LLMs) love