Function Signature
Description
Loads annotated documents from a JSON Lines file. Each line in the file is parsed as a separate JSON object and converted to an AnnotatedDocument. This function yields documents incrementally, making it memory-efficient for large datasets.Parameters
The file path to the JSON Lines file containing annotated documents. This should be a file previously saved using
save_annotated_documents() or following the same format.Whether to show a progress bar during the loading operation. The progress bar tracks bytes read and provides an estimate of completion time.
Returns
An iterator that yields AnnotatedDocument objects. Each document contains the original text, document ID, and extracted annotations.
Exceptions
- IOError: If the file does not exist or cannot be read.
- json.JSONDecodeError: If a line in the file contains invalid JSON.
Usage Example
Notes
- The function reads files with UTF-8 encoding.
- Empty lines in the file are automatically skipped.
- Progress tracking is based on file size (bytes read), providing accurate progress estimates.
- The function is memory-efficient as it yields documents one at a time rather than loading all into memory.
- Progress information includes the total number of documents loaded and the file path.