Benchmark Results
The following benchmarks were run on Python 3.13.7 (win32). Results are in microseconds (μs) per operation (lower is better).JSON Serialization (Object → JSON)
| Library | Simple (μs) | Complex (μs) | Nested (μs) |
|---|---|---|---|
| Lodum | 7.62 ± 1.87 | 15.45 ± 2.03 | 36.51 ± 3.67 |
| Pydantic (v2) | 3.13 ± 1.63 | 3.31 ± 0.40 | 6.76 ± 0.52 |
| Marshmallow | 12.73 ± 1.73 | 30.23 ± 0.97 | 73.29 ± 4.58 |
| Native json (dict) | 4.29 ± 0.41 | 6.76 ± 0.57 | 8.78 ± 0.46 |
| orjson (dict) | 0.50 ± 0.02 | 0.73 ± 0.02 | 0.98 ± 0.01 |
JSON Deserialization (JSON → Object)
| Library | Simple (μs) | Complex (μs) | Nested (μs) |
|---|---|---|---|
| Lodum | 21.75 ± 1.70 | 42.52 ± 2.13 | 131.67 ± 6.75 |
| Pydantic (v2) | 3.21 ± 0.76 | 3.94 ± 0.71 | 16.52 ± 0.95 |
| Marshmallow | 31.21 ± 4.01 | 72.18 ± 4.93 | 226.99 ± 6.63 |
| Native json (dict) | 3.15 ± 0.40 | 4.52 ± 0.64 | 7.59 ± 0.57 |
| orjson (dict) | 0.77 ± 0.10 | 1.52 ± 0.06 | 2.84 ± 0.13 |
Binary Formats (Lodum)
| Format | Operation | Simple (μs) | Complex (μs) | Nested (μs) |
|---|---|---|---|---|
| MsgPack | Serialization | 4.60 ± 1.37 | 10.15 ± 0.40 | 31.31 ± 3.03 |
| Deserialization | 18.22 ± 2.12 | 35.90 ± 2.62 | 119.92 ± 7.01 | |
| CBOR | Serialization | 11.61 ± 0.88 | 18.84 ± 0.70 | 43.67 ± 2.76 |
| Deserialization | 21.61 ± 2.00 | 39.39 ± 3.38 | 132.37 ± 4.49 | |
| Pickle | Serialization | 8.91 ± 0.73 | 13.26 ± 0.95 | 39.72 ± 2.00 |
| Deserialization | 6.75 ± 0.42 | 9.87 ± 1.75 | 16.21 ± 1.22 |
Performance Analysis
Lodum vs Marshmallow
Lodum consistently outperforms Marshmallow (often 2x faster), particularly in serialization and handling complex structures. This performance advantage comes from Lodum’s AST-based bytecode compilation approach.Lodum vs Pydantic
Pydantic v2 remains faster due to its Rust-based core. However, Lodum provides a competitive pure-Python alternative with:- Zero binary dependencies
- Full control over the compilation process
- Excellent cross-platform compatibility (including WASM)
AST Optimization Benefits
The move to AST-based code generation has significantly improved performance compared to string-basedexec methods while providing:
- Better type safety
- More informative error messages
- Easier debugging of generated code
Thread Safety
The modular refactor introduced thread-safe global state management viaContext without performance regressions, thanks to a lock-free fast path for handler cache lookups.
Optimization Tips
1. Reuse Compiled Handlers
Lodum compiles handlers once per class. The first serialization/deserialization call will be slower due to compilation, but subsequent calls use the cached handler:2. Use Streaming for Large Datasets
For large files, use streaming to avoid loading everything into memory:3. Choose the Right Format
Based on the benchmarks:- MsgPack: Best all-around binary format (smallest size, good performance)
- Pickle: Fastest deserialization, but Python-only and security concerns
- CBOR: Good for IoT/embedded systems with standardization needs
- JSON: Best for human-readable data and web APIs
4. Avoid Deep Nesting
Deeply nested structures have higher overhead. Consider flattening your data structures when possible:5. Use Binary Formats for Internal Services
If you control both ends of the communication:Running Benchmarks Yourself
To run benchmarks on your own machine:Memory Usage
Lodum’s AST compilation approach has minimal memory overhead:- Handler Cache: One compiled function per
@lodumclass - Context: Thread-local state with lock-free fast path
- Streaming: O(1) memory for large files when using
stream()functions
Future Optimizations
Planned performance improvements:- Cython Acceleration: Optional Cython-compiled core for 5-10x speedup
- Parallel Serialization: Multi-threaded serialization for large collections
- Zero-Copy Deserialization: Direct memory mapping for binary formats
- JIT Optimization: Integration with PyPy’s JIT compiler