Implementing New Formats

Lodum’s protocol-based architecture makes it easy to add support for new serialization formats. This guide walks you through implementing a custom format by showing real examples from Lodum’s built-in formats.

Overview

To implement a new format, you need to:

Create a Dumper class that implements the Dumper protocol
Create a Loader class that implements the Loader protocol
Provide high-level dump() and load() functions
Optionally implement streaming support

The Dumper Protocol

The Dumper protocol defines how to serialize Python objects to your format. Here’s the complete interface:

from lodum.core import Dumper
from typing import Any, Callable, Optional, Type

class Dumper(Protocol):
    # Primitive types
    def dump_int(self, value: int, depth: int = 0, seen: Optional[set] = None) -> Any: ...
    def dump_str(self, value: str, depth: int = 0, seen: Optional[set] = None) -> Any: ...
    def dump_float(self, value: float, depth: int = 0, seen: Optional[set] = None) -> Any: ...
    def dump_bool(self, value: bool, depth: int = 0, seen: Optional[set] = None) -> Any: ...
    def dump_bytes(self, value: bytes, depth: int = 0, seen: Optional[set] = None) -> Any: ...
    def dump_none(self, depth: int = 0, seen: Optional[set] = None) -> Any: ...
    def dump_list(self, value: list, depth: int = 0, seen: Optional[set] = None) -> Any: ...
    def dump_dict(self, value: dict, depth: int = 0, seen: Optional[set] = None) -> Any: ...
    
    # Structure orchestration
    def begin_struct(self, cls: Type) -> Any: ...
    def end_struct(self) -> Any: ...
    def field(
        self,
        name: str,
        value: Any,
        handler: Callable[[Any, "Dumper", int, Optional[set]], Any],
        depth: int = 0,
        seen: Optional[set] = None,
    ) -> None: ...
    
    # List orchestration
    def begin_list(self) -> None: ...
    def end_list(self) -> Any: ...
    def list_item(
        self,
        value: Any,
        handler: Callable[[Any, "Dumper", int, Optional[set]], Any],
        depth: int = 0,
        seen: Optional[set] = None,
    ) -> None: ...

The Loader Protocol

The Loader protocol defines how to deserialize data from your format:

from lodum.core import Loader
from typing import Any, Iterator, Optional, Dict, List, Union

class Loader(Protocol):
    def load_int(self) -> int: ...
    def load_str(self) -> str: ...
    def load_float(self) -> float: ...
    def load_bool(self) -> bool: ...
    def load_bytes(self) -> bytes: ...
    def load_list(self) -> Iterator['Loader']: ...
    def load_dict(self) -> Iterator[tuple[str, 'Loader']]: ...
    def load_any(self) -> Any: ...
    def mark(self) -> Any: ...
    def rewind(self, marker: Any) -> None: ...
    def get_dict(self) -> Optional[Union[Dict[str, Any], List[Any]]]: ...
    def load_bytes_value(self, value: Any) -> bytes: ...

Example 1: Simple Binary Format (MsgPack)

MsgPack is a binary format that’s very simple to implement because it maps cleanly to Python’s data types.

MsgPack Dumper

from lodum.core import BaseDumper

class MsgPackDumper(BaseDumper):
    """MsgPack dumper - uses BaseDumper defaults for everything."""
    pass

That’s it! BaseDumper provides default implementations that build a Python dict/list structure, which MsgPack can encode directly.

MsgPack Loader

from lodum.core import BaseLoader
from lodum.exception import DeserializationError
from typing import Iterator

class MsgPackLoader(BaseLoader):
    def load_list(self) -> Iterator["Loader"]:
        if not isinstance(self._data, list):
            raise DeserializationError(
                f"Expected list, got {type(self._data).__name__}"
            )
        return (MsgPackLoader(item) for item in self._data)
    
    def load_dict(self) -> Iterator[tuple[str, "Loader"]]:
        if not isinstance(self._data, dict):
            raise DeserializationError(
                f"Expected dict, got {type(self._data).__name__}"
            )
        return ((k, MsgPackLoader(v)) for k, v in self._data.items())
    
    def load_bytes_value(self, value: Any) -> bytes:
        if not isinstance(value, bytes):
            raise DeserializationError(f"Expected bytes, got {type(value).__name__}")
        return value

MsgPack Public API

import msgpack
from typing import Any, Type, TypeVar, Optional, Union, IO
from pathlib import Path
from lodum.internal import dump as dump_internal, load as load_internal
from lodum.internal import DEFAULT_MAX_SIZE, _resolve_source, _resolve_target

T = TypeVar("T")

def dump(
    obj: Any, target: Optional[Union[IO[bytes], Path]] = None, **kwargs
) -> Optional[bytes]:
    """Encodes a Python object to MsgPack."""
    dumper = MsgPackDumper()
    dumped_data = dump_internal(obj, dumper)
    
    kwargs.setdefault("use_bin_type", True)
    
    with _resolve_target(target, "wb") as out:
        if out is None:
            return msgpack.packb(dumped_data, **kwargs)
        out.write(msgpack.packb(dumped_data, **kwargs))
        return None

def load(
    cls: Type[T],
    source: Union[bytes, IO[bytes], Path],
    max_size: int = DEFAULT_MAX_SIZE,
) -> T:
    """Decodes MsgPack from bytes, stream, or file into a Python object."""
    with _resolve_source(source, "rb") as src:
        if isinstance(src, (bytes, bytearray)):
            if len(src) > max_size:
                raise DeserializationError(
                    f"Input size ({len(src)}) exceeds maximum allowed ({max_size})"
                )
            data = msgpack.unpackb(src, raw=False)
        else:
            data = msgpack.unpack(src, raw=False)
    
    loader = MsgPackLoader(data)
    return load_internal(cls, loader)

Example 2: Text Format with Encoding (JSON)

JSON requires special handling for bytes (base64 encoding) and supports streaming output.

JSON Dumper

import json
import base64
from lodum.core import BaseDumper
from typing import Any, Optional

class JsonDumper(BaseDumper):
    def dump_bytes(
        self, value: bytes, depth: int = 0, seen: Optional[set] = None
    ) -> Any:
        return base64.b64encode(value).decode("ascii")

JSON Streaming Dumper

For large files, you can implement a streaming dumper that writes directly to a file:

import json
from lodum.core import StreamingDumper
from typing import Any, Type, Callable, Optional, IO

class JsonStreamingDumper(StreamingDumper):
    """Writes JSON tokens directly to a stream."""
    
    def dump_int(self, value: int, depth: int = 0, seen: Optional[set] = None) -> Any:
        self.write_raw(str(value))
    
    def dump_str(self, value: str, depth: int = 0, seen: Optional[set] = None) -> Any:
        self.write_raw(json.dumps(value))
    
    def dump_float(
        self, value: float, depth: int = 0, seen: Optional[set] = None
    ) -> Any:
        self.write_raw(str(value))
    
    def dump_bool(self, value: bool, depth: int = 0, seen: Optional[set] = None) -> Any:
        self.write_raw("true" if value else "false")
    
    def dump_none(self, depth: int = 0, seen: Optional[set] = None) -> Any:
        self.write_raw("null")
    
    def dump_bytes(
        self, value: bytes, depth: int = 0, seen: Optional[set] = None
    ) -> Any:
        encoded = base64.b64encode(value).decode("ascii")
        self.write_raw(json.dumps(encoded))
    
    def begin_struct(self, cls: Type) -> Any:
        super().begin_struct(cls)
        self.write_raw("{")
        return None
    
    def end_struct(self) -> Any:
        self.write_raw("}")
        return super().end_struct()
    
    def field(
        self,
        name: str,
        value: Any,
        handler: Callable,
        depth: int = 0,
        seen: Optional[set] = None,
    ) -> None:
        if not self._first_item_stack[-1]:
            self.write_raw(",")
        self._first_item_stack[-1] = False
        
        self.write_raw(json.dumps(name))
        self.write_raw(":")
        handler(value, self, depth, seen)
    
    def begin_list(self) -> None:
        super().begin_list()
        self.write_raw("[")
    
    def end_list(self) -> Any:
        self.write_raw("]")
        return super().end_list()
    
    def list_item(
        self,
        value: Any,
        handler: Callable,
        depth: int = 0,
        seen: Optional[set] = None,
    ) -> None:
        if not self._first_item_stack[-1]:
            self.write_raw(",")
        self._first_item_stack[-1] = False
        
        handler(value, self, depth, seen)

JSON Loader

import json
import base64
from lodum.core import BaseLoader
from lodum.exception import DeserializationError
from typing import Iterator, Any

class JsonLoader(BaseLoader):
    def load_list(self) -> Iterator["Loader"]:
        if not isinstance(self._data, list):
            raise DeserializationError(
                f"Expected list, got {type(self._data).__name__}"
            )
        return (JsonLoader(item) for item in self._data)
    
    def load_dict(self) -> Iterator[tuple[str, "Loader"]]:
        if not isinstance(self._data, dict):
            raise DeserializationError(
                f"Expected dict, got {type(self._data).__name__}"
            )
        return ((k, JsonLoader(v)) for k, v in self._data.items())
    
    def load_bytes_value(self, value: Any) -> bytes:
        if not isinstance(value, str):
            raise DeserializationError(f"Expected str, got {type(value).__name__}")
        try:
            return base64.b64decode(value)
        except Exception as e:
            raise DeserializationError(f"Failed to decode base64: {e}")

Example 3: Format with Type-Specific Handling (CBOR)

Some formats require type-specific validation in the loader:

from lodum.core import BaseLoader
from lodum.exception import DeserializationError
from typing import Iterator, Any

class CborLoader(BaseLoader):
    def load_int(self) -> int:
        if not isinstance(self._data, int):
            raise DeserializationError(f"Expected int, got {type(self._data).__name__}")
        return self._data
    
    def load_str(self) -> str:
        if not isinstance(self._data, str):
            raise DeserializationError(f"Expected str, got {type(self._data).__name__}")
        return self._data
    
    def load_float(self) -> float:
        if not isinstance(self._data, (float, int)):
            raise DeserializationError(
                f"Expected float, got {type(self._data).__name__}"
            )
        return float(self._data)
    
    def load_bool(self) -> bool:
        if not isinstance(self._data, bool):
            raise DeserializationError(
                f"Expected bool, got {type(self._data).__name__}"
            )
        return self._data
    
    def load_list(self) -> Iterator["Loader"]:
        if not isinstance(self._data, list):
            raise DeserializationError(
                f"Expected list, got {type(self._data).__name__}"
            )
        return (CborLoader(item) for item in self._data)
    
    def load_dict(self) -> Iterator[tuple[str, "Loader"]]:
        if not isinstance(self._data, dict):
            raise DeserializationError(
                f"Expected dict, got {type(self._data).__name__}"
            )
        return ((k, CborLoader(v)) for k, v in self._data.items())
    
    def load_bytes_value(self, value: Any) -> bytes:
        if not isinstance(value, bytes):
            raise DeserializationError(f"Expected bytes, got {type(value).__name__}")
        return value

Streaming Support

For formats that support streaming (reading/writing one object at a time), implement a stream() function:

from typing import Iterator, Type, Union, IO
from pathlib import Path

def stream(cls: Type[T], source: Union[IO[bytes], Path]) -> Iterator[T]:
    """Lazily decodes a stream of MsgPack objects."""
    import msgpack
    
    with _resolve_source(source, "rb") as src:
        # Use Unpacker for streaming multiple objects
        unpacker = msgpack.Unpacker(src, raw=False)
        for data in unpacker:
            yield load_internal(cls, MsgPackLoader(data))

Best Practices

1. Use Base Classes

Always inherit from BaseDumper and BaseLoader to get sensible defaults:

from lodum.core import BaseDumper, BaseLoader

# Good: Minimal implementation
class MyDumper(BaseDumper):
    def dump_bytes(self, value: bytes, depth: int = 0, seen: Optional[set] = None) -> Any:
        return custom_encode(value)

# Avoid: Implementing everything from scratch
class MyDumper:  # Don't do this
    def dump_int(self, value: int, depth: int = 0, seen: Optional[set] = None) -> Any:
        return value
    # ... 20 more methods ...

2. Handle Errors Gracefully

Always raise DeserializationError with helpful messages:

from lodum.exception import DeserializationError

def load_int(self) -> int:
    if not isinstance(self._data, int):
        raise DeserializationError(
            f"Expected int, got {type(self._data).__name__}"
        )
    return self._data

3. Support File Paths

Use the helper functions to support strings, IO objects, and Paths:

from lodum.internal import _resolve_source, _resolve_target

def dump(obj: Any, target: Optional[Union[IO[bytes], Path]] = None) -> Optional[bytes]:
    with _resolve_target(target, "wb") as out:
        # ... encode ...
        if out is None:
            return encoded_bytes
        out.write(encoded_bytes)
        return None

4. Implement Streaming When Possible

For better performance with large files:

# Memory-efficient streaming
for record in myformat.stream(Record, "large_file.dat"):
    process(record)

# vs. loading everything (avoid for large files)
records = myformat.load(list[Record], "large_file.dat")

5. Add Type Hints

Proper type hints help users and enable IDE autocompletion:

from typing import Type, TypeVar

T = TypeVar("T")

def load(cls: Type[T], source: Union[bytes, IO[bytes], Path]) -> T:
    """Decodes data into an instance of cls."""
    ...

Testing Your Implementation

Create comprehensive tests for your format:

import pytest
from lodum import lodum
from your_module import myformat

@lodum
class TestClass:
    def __init__(self, id: int, name: str, tags: list[str]):
        self.id = id
        self.name = name
        self.tags = tags

def test_roundtrip():
    obj = TestClass(1, "Alice", ["admin", "user"])
    encoded = myformat.dump(obj)
    decoded = myformat.load(TestClass, encoded)
    
    assert decoded.id == obj.id
    assert decoded.name == obj.name
    assert decoded.tags == obj.tags

def test_error_handling():
    with pytest.raises(DeserializationError):
        myformat.load(TestClass, b"invalid data")

Publishing Your Format

Once implemented, you can publish your format as a separate package:

# pyproject.toml
[project]
name = "lodum-myformat"
dependencies = [
    "lodum>=0.2.0",
    "myformat-library>=1.0",
]

This allows users to install it separately:

pip install lodum-myformat

Get Started

Core Concepts

Guides

Format Support

Advanced

Implementing New Formats

Overview

The Dumper Protocol

The Loader Protocol

Example 1: Simple Binary Format (MsgPack)

MsgPack Dumper

MsgPack Loader

MsgPack Public API

Example 2: Text Format with Encoding (JSON)

JSON Dumper

JSON Streaming Dumper

JSON Loader

Example 3: Format with Type-Specific Handling (CBOR)

Streaming Support

Best Practices

1. Use Base Classes

2. Handle Errors Gracefully

3. Support File Paths

4. Implement Streaming When Possible

5. Add Type Hints

Testing Your Implementation

Publishing Your Format

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Format Support

Advanced

​Overview

​The Dumper Protocol

​The Loader Protocol

​Example 1: Simple Binary Format (MsgPack)

​MsgPack Dumper

​MsgPack Loader

​MsgPack Public API

​Example 2: Text Format with Encoding (JSON)

​JSON Dumper

​JSON Streaming Dumper

​JSON Loader

​Example 3: Format with Type-Specific Handling (CBOR)

​Streaming Support

​Best Practices

​1. Use Base Classes

​2. Handle Errors Gracefully

​3. Support File Paths

​4. Implement Streaming When Possible

​5. Add Type Hints

​Testing Your Implementation

​Publishing Your Format

Build docs developers (and LLMs) love

Overview

The Dumper Protocol

The Loader Protocol

Example 1: Simple Binary Format (MsgPack)

MsgPack Dumper

MsgPack Loader

MsgPack Public API

Example 2: Text Format with Encoding (JSON)

JSON Dumper

JSON Streaming Dumper

JSON Loader

Example 3: Format with Type-Specific Handling (CBOR)

Streaming Support

Best Practices

1. Use Base Classes

2. Handle Errors Gracefully

3. Support File Paths

4. Implement Streaming When Possible

5. Add Type Hints

Testing Your Implementation

Publishing Your Format