For extremely large datasets, Lodum supports O(1) memory streaming serialization. This allows you to encode massive object graphs directly to an IO stream (like a file or socket) without building the entire representation in memory.
Streaming serialization is essential when:
Working with datasets that don’t fit in memory
Sending data over networks incrementally
Processing large files without loading them entirely
from lodum import lodum, jsonfrom pathlib import Path@lodumclass Record: def __init__(self, id: int, value: str): self.id = id self.value = value# Assuming records.json contains: [{"id": 1, "value": "a"}, {"id": 2, "value": "b"}, ...]for record in json.stream(Record, Path("records.json")): process_record(record) # Process one at a time # Only one Record object is in memory at a time
# Stream from a file handle (must be opened in binary mode)with open("large_array.json", "rb") as f: for item in json.stream(Record, f): print(f"Processing record {item.id}")
The stream() function requires the ijson package for incremental JSON parsing:
from lodum import json# Entire JSON string built in memoryjson_string = json.dumps(large_object) # Write string to filewith open("output.json", "w") as f: f.write(json_string)
The json.stream() function requires binary mode because it uses ijson internally:
# Correctwith open("data.json", "rb") as f: for item in json.stream(Record, f): process(item)# Incorrect (will fail)with open("data.json", "r") as f: # Text mode for item in json.stream(Record, f): process(item)
Process Items Incrementally
Don’t accumulate streamed items back into a list:
# Good: Process one at a timefor record in json.stream(Record, file): process_and_discard(record)# Bad: Defeats the purpose of streamingall_records = list(json.stream(Record, file))