Core Philosophy
Lodum is designed around three principles:- Protocol-First: Serialization logic is decoupled from the data format.
- Runtime Compilation: We generate specialized bytecode for your classes to avoid overhead.
- Declarative Data: The
@lodumdecorator captures the “shape” of data once, and it is reused everywhere.
High-Level Architecture
1. The Dynamic Bytecode Engine
The heart of Lodum is insrc/lodum/internal.py.
Unlike libraries that use generic introspection (looping over __dict__ or using getattr) for every object, Lodum inspects your class once.
Compilation Pipeline
- Analysis: When you first call
dumpsorloads, the engine analyzes the type hints of your fields. - AST Construction: It constructs a Python Abstract Syntax Tree (AST) representing a highly optimized function specifically for that class.
- Example: If your class has a
List[int], the generated AST will include nodes to checkisinstance(val, list)directly and call theintloader for each item, typically unrolling standard overheads.
- Example: If your class has a
- Compilation: The AST is compiled into a code object using Python’s built-in
compile(), which is then loaded viaexec(). This avoids the security and fragility issues of string-based code generation. - Binding: This function is cached and reused for all future operations.
Example: Compiled Handler
When you define a class like this:2. The Abstract Protocols
Lodum uses two main protocols to bridge the gap between “Python Objects” and “Bytes/Strings”.The Dumper Protocol
A Dumper knows how to take primitive types and write them to a specific format.
The Loader Protocol
A Loader knows how to read primitive types from a specific format.
Extensibility
Becauseinternal.py only talks to these protocols, adding a new format (like CBOR) is as simple as implementing these methods. The optimization engine automatically works for the new format without any changes.
Base Classes
To reduce boilerplate when implementing new formats,core.py also provides BaseDumper and BaseLoader. These classes provide default implementations for many protocol methods (such as handling primitive types), allowing format authors to focus on the unique aspects of their format.
3. Validation & Schemas
Validation Pipeline
Validation is injected directly into the generatedloads handler.
- Decode: The value is read from the wire (e.g., JSON string → Python
str). - Validate: The value is passed to any validators defined in
field(validate=...). - Instantiate: Only if validation passes is the actual object created.
Error Path Tracking
One of the key features of Lodum is precise error reporting. During deserialization, the generated loaders maintain apath string that tracks the current position in the data structure.
- When entering a dictionary/struct, the path is appended with
.field_name. - When entering a list, the path is appended with
[index].
load(). If a DeserializationError occurs (e.g., a type mismatch or a validation failure), the error captures the current path. This allows Lodum to provide helpful error messages like:
Schema Generation
json.schema() uses a recursive visitor pattern to walk the type hints of a @lodum class and construct a standard JSON Schema dictionary. This is separate from the serialization engine but shares the same type analysis logic.
4. Thread Safety
Lodum uses a Context system for thread-safe operation:- Thread-local contexts: Each thread can have its own context
- Lock-free fast path: Cache lookups don’t require locks in the common case
- Double-checked locking: Only compilation requires lock acquisition
Directory Structure
Key Implementation Details
Handler Cache Lookup
Frominternal.py:196-210:
Circular Reference Detection
Frominternal.py:137-163:
Performance Characteristics
- First serialization/deserialization: O(n) where n is the number of fields (due to compilation)
- Subsequent operations: O(m) where m is the size of the data (no reflection overhead)
- Memory: O(1) per class (one compiled handler)
- Thread safety: Lock-free reads, locked writes to cache