Overview
Thevisualize() function creates interactive HTML visualizations of extraction results. It displays the original text with highlighted extractions, allowing you to step through each entity and view its attributes.
Function Signature
Parameters
The source of extraction data to visualize. Can be:
- An
AnnotatedDocumentobject (returned fromlx.extract()) - A string path to a JSONL file containing saved extractions
- A
pathlib.Pathobject pointing to a JSONL file
Animation speed in seconds between extractions when playing in auto-play mode.
- Lower values (e.g.,
0.5) create faster animations - Higher values (e.g.,
2.0) slow down the animation - Default is
1.0second per extraction
This is a keyword-only parameter.
If
True, displays a color legend mapping extraction classes to their highlight colors at the top of the visualization.This is a keyword-only parameter.
If
True, applies GIF-optimized styling with:- Larger fonts for better readability
- Better contrast and improved dimensions
- Enhanced styling for video capture
This is a keyword-only parameter.
Returns
Returns an
IPython.display.HTML object if IPython is available (Jupyter notebook environment), otherwise returns the generated HTML string.The HTML includes:- Syntax-highlighted text with colored spans for each extraction
- Interactive controls (Play/Pause, Previous, Next)
- Progress slider to jump to any extraction
- Attributes panel showing details of the current extraction
- Status text showing current position and extraction count
Exceptions
Raised when
data_source is a file path that does not exist.Raised when:
- The JSONL file contains no documents
- The
AnnotatedDocumentcontains no text - The
AnnotatedDocumentcontains no extractions
Visualization Features
Interactive Controls
The visualization includes several interactive controls:- Play/Pause Button: Automatically cycle through extractions
- Previous Button: Jump to the previous extraction
- Next Button: Jump to the next extraction
- Progress Slider: Manually navigate to any extraction
- Auto-scroll: Automatically scrolls the current extraction into view
Color Coding
Each extraction class is automatically assigned a unique color from a Material Design-inspired palette:- Light Blue (#D2E3FC)
- Light Green (#C8E6C9)
- Light Yellow (#FEF0C3)
- Light Red (#F9DEDC)
- Light Orange (#FFDDBE)
- Light Purple (#EADDFF)
- Light Teal (#C4E9E4)
- Light Pink (#FCE4EC)
- Very Light Grey (#E8EAED)
- Pale Cyan (#DDE8E8)
Attributes Display
For each extraction, the attributes panel shows:- class: The extraction class/entity type
- attributes: All extracted attributes as key-value pairs
- Empty or null attributes are filtered out for cleaner display
Examples
Basic Visualization
Load from JSONL File
Customize Animation Speed
Without Legend
Optimize for Recording
Save as HTML File
Use with pathlib
Technical Details
Extraction Filtering
Only extractions with valid character intervals are displayed. An extraction is considered valid if:- It has a non-null
char_interval - The
start_posis notNone - The
end_posis notNone start_pos < end_pos
HTML Structure
The generated HTML includes:- CSS styles: Embedded styles for highlighting, controls, and animations
- Text window: Scrollable container with highlighted text
- Attributes panel: Shows details of the current extraction
- Controls: Interactive buttons and slider
- JavaScript: Handles interactivity and state management
Nested Extractions
The visualization properly handles nested and overlapping extractions:- Spans are sorted by position, then by length
- Longer spans open first, shorter spans close first
- Ensures valid HTML nesting
Performance
The visualization is optimized for documents with many extractions:- Uses efficient DOM updates
- Smooth scrolling with
scrollIntoView - CSS animations with hardware acceleration
- Minimal JavaScript overhead
Environment Detection
The function automatically detects the execution environment:- Jupyter/IPython: Returns
IPython.display.HTMLobject for inline display - Standard Python: Returns raw HTML string
- Checks for
IPythonavailability and notebook context
See Also
- extract() - Extract structured information from text
- AnnotatedDocument - Understanding extraction results
- save() and load() - Working with JSONL files