Skip to main content

Function Signature

def save_annotated_documents(
    annotated_documents: Iterator[data.AnnotatedDocument],
    output_dir: pathlib.Path | str | None = None,
    output_name: str = 'data.jsonl',
    show_progress: bool = True,
) -> None

Description

Saves annotated documents to a JSON Lines file. Each document is written as a separate JSON object on its own line, making it easy to process large datasets incrementally.

Parameters

annotated_documents
Iterator[AnnotatedDocument]
required
Iterator over AnnotatedDocument objects to save. These are the documents that have been processed and annotated by the LLM.
output_dir
pathlib.Path | str | None
default:"None"
The directory to which the JSONL file should be written. Can be a Path object or a string. If None, defaults to test_output/ directory.
output_name
str
default:"'data.jsonl'"
File name for the JSONL file. The file will be created at output_dir/output_name.
show_progress
bool
default:"True"
Whether to show a progress bar during the saving operation. Useful for tracking progress with large datasets.

Returns

None. The function writes data to disk and displays progress information if enabled.

Exceptions

  • IOError: If the output directory cannot be created.
  • InvalidDatasetError: If no valid documents are produced (all documents have empty document_id).

Usage Example

from langextract import io
from langextract.core import data
from pathlib import Path

# Create some annotated documents
annotated_docs = [
    data.AnnotatedDocument(
        document_id="doc1",
        text="Sample text",
        annotations={"category": "news", "sentiment": "positive"}
    ),
    data.AnnotatedDocument(
        document_id="doc2",
        text="Another sample",
        annotations={"category": "blog", "sentiment": "neutral"}
    )
]

# Save to default location (test_output/data.jsonl)
io.save_annotated_documents(iter(annotated_docs))

# Save to custom location
io.save_annotated_documents(
    iter(annotated_docs),
    output_dir="results/experiment_1",
    output_name="annotations.jsonl",
    show_progress=True
)

# Save without progress bar
io.save_annotated_documents(
    iter(annotated_docs),
    output_dir=Path("/data/output"),
    show_progress=False
)

Notes

  • The output directory is created automatically if it doesn’t exist (including parent directories).
  • Documents with empty document_id fields are skipped.
  • Files are written with UTF-8 encoding and ensure_ascii=False for proper internationalization support.
  • Progress information includes the number of documents saved and the output file path.

Build docs developers (and LLMs) love