Pickle Format
Lodum provides secure pickle support with SafeUnpickler that prevents arbitrary code execution. Use this for Python-specific serialization when you need maximum compatibility with Python objects.
Security Notice: Standard Python pickle is unsafe and can execute arbitrary code during deserialization. Lodum’s SafeUnpickler provides protection by only allowing whitelisted types and @lodum-decorated classes.
Installation
Pickle support is built into Python and Lodum core:
No additional dependencies required.
API Reference
dump()
from lodum import pickle
pickle.dump(obj: Any, target: Optional[Union[IO[bytes], Path]] = None, **kwargs) -> Optional[bytes]
Encodes a Python object to a pickle byte string, ensuring it is safe.
Parameters:
obj: The object to encode (must be a @lodum-decorated class)
target: Optional file-like object or Path to write to
**kwargs: Additional arguments for pickle.dump(s) (e.g., protocol)
Returns:
- The pickle bytes if
target is None, otherwise None
Security: The object is validated before pickling to ensure it only contains safe types.
Example:
from lodum import lodum, pickle
from pathlib import Path
@lodum
class User:
name: str
age: int
metadata: dict[str, str]
user = User(name="Alice", age=30, metadata={"role": "admin"})
# Serialize to bytes
pickle_bytes = pickle.dump(user)
# Serialize to file
pickle.dump(user, Path("user.pkl"))
dumps()
pickle.dumps(obj: Any, **kwargs) -> bytes
Legacy alias for dump(obj). Provided for compatibility.
load()
pickle.load(
cls: Type[T],
source: Union[bytes, IO[bytes], Path],
max_size: int = DEFAULT_MAX_SIZE
) -> T
Decodes a pickle from bytes, stream, or file into a Python object using SafeUnpickler.
Parameters:
cls: The class to instantiate
source: pickle bytes, file-like object, or Path
max_size: Maximum allowed size for bytes input (default: 10MB)
Returns:
Security: Only allows safe built-in types and @lodum-decorated classes. Blocks dangerous modules like os, sys, and subprocess.
Example:
from lodum import lodum, pickle
from pathlib import Path
@lodum
class User:
name: str
age: int
metadata: dict[str, str]
# Load from bytes
pickle_bytes = b'...'
user = pickle.load(User, pickle_bytes)
print(f"{user.name}, age {user.age}")
# Load from file
user = pickle.load(User, Path("user.pkl"))
loads()
pickle.loads(cls: Type[T], data: bytes, **kwargs) -> T
Legacy alias for load(cls, source). Provided for compatibility.
SafeUnpickler Security
The SafeUnpickler class provides critical security protections:
Blocked Modules
These dangerous modules are blocked:
os - Operating system operations
sys - System operations
subprocess - Process execution
- Any module containing these strings
Allowed Built-in Types
Only these safe built-in types are allowed:
- Primitives:
int, float, str, bool, bytes, complex
- Collections:
list, tuple, dict, set, frozenset
- Utilities:
bytearray, NoneType, type
Allowed Standard Library
These standard library classes are allowed:
collections.defaultdict
collections.OrderedDict
collections.Counter
array.array
Lodum Classes
Any class decorated with @lodum is automatically allowed.
Security Example
from lodum import pickle
import io
# Attempting to unpickle malicious code will fail
malicious_pickle = b"""cos
system
(S'echo HACKED'
tR."""
try:
# This will raise UnpicklingError
pickle.load(object, io.BytesIO(malicious_pickle))
except Exception as e:
print(f"Blocked: {e}")
# Blocked: Unsafe module 'os' is forbidden.
ValidationDumper
Before pickling, Lodum validates the object structure using ValidationDumper:
from lodum import lodum, pickle
@lodum
class Safe:
value: int
class Unsafe:
value: int
safe = Safe(value=42)
unsafe = Unsafe(value=42)
# This works - Safe is decorated with @lodum
pickle.dump(safe)
# This fails - Unsafe is not decorated
try:
pickle.dump(unsafe)
except AttributeError as e:
print(f"Validation failed: {e}")
Protocol Versions
Pickle supports multiple protocol versions. Use higher protocols for better performance:
from lodum import lodum, pickle
@lodum
class Data:
values: list[int]
data = Data(values=list(range(1000)))
# Protocol 5 (Python 3.8+) - fastest, supports large objects
pickle_v5 = pickle.dump(data, protocol=5)
# Protocol 4 (Python 3.4+) - good performance
pickle_v4 = pickle.dump(data, protocol=4)
# Default protocol (recommended)
pickle_default = pickle.dump(data)
print(f"Protocol 5 size: {len(pickle_v5)}")
print(f"Protocol 4 size: {len(pickle_v4)}")
print(f"Default size: {len(pickle_default)}")
When to Use Pickle
Good Use Cases
- Internal Python applications: Where data never leaves your Python ecosystem
- Caching: Temporary storage of Python objects
- IPC: Communication between trusted Python processes
- Development: Quick serialization during development
from lodum import lodum, pickle
import tempfile
@lodum
class CacheEntry:
key: str
value: dict[str, int]
timestamp: int
# Simple cache implementation
def save_cache(entry: CacheEntry, cache_file: str):
pickle.dump(entry, Path(cache_file))
def load_cache(cache_file: str) -> CacheEntry:
return pickle.load(CacheEntry, Path(cache_file))
Avoid Pickle For
- Network protocols: Use JSON, MessagePack, or CBOR instead
- Long-term storage: Pickle is not forward/backward compatible
- Cross-language: Pickle is Python-specific
- Untrusted data: Even SafeUnpickler has limits
- Public APIs: Use standard formats like JSON
Pickle offers excellent performance for Python objects:
from lodum import lodum, pickle, json, msgpack
import time
@lodum
class ComplexData:
lists: list[list[int]]
dicts: dict[str, dict[str, str]]
nested: list[dict[str, list[int]]]
data = ComplexData(
lists=[[i for i in range(100)] for _ in range(100)],
dicts={f"key{i}": {f"inner{j}": f"value{j}" for j in range(10)} for i in range(100)},
nested=[{f"k{i}": [1, 2, 3]} for i in range(1000)]
)
# Benchmark serialization
start = time.time()
pickle_data = pickle.dump(data)
pickle_time = time.time() - start
start = time.time()
json_data = json.dump(data)
json_time = time.time() - start
start = time.time()
msgpack_data = msgpack.dump(data)
msgpack_time = time.time() - start
print(f"Pickle: {pickle_time*1000:.2f}ms, {len(pickle_data)} bytes")
print(f"JSON: {json_time*1000:.2f}ms, {len(json_data)} bytes")
print(f"MessagePack: {msgpack_time*1000:.2f}ms, {len(msgpack_data)} bytes")
Migration from Standard Pickle
If you’re migrating from standard pickle:
# Before (UNSAFE!)
import pickle
with open('data.pkl', 'wb') as f:
pickle.dump(my_object, f)
with open('data.pkl', 'rb') as f:
loaded = pickle.load(f) # DANGEROUS!
# After (SAFE with Lodum)
from lodum import lodum, pickle
from pathlib import Path
@lodum # Add decorator to your class
class MyClass:
# ... fields ...
pass
# Dumping is similar
pickle.dump(my_object, Path('data.pkl'))
# Loading requires class and uses SafeUnpickler
loaded = pickle.load(MyClass, Path('data.pkl'))
Security Best Practices
- Always use SafeUnpickler: Never use standard
pickle.load()
- Validate sources: Only unpickle data from trusted sources
- Use size limits: Set appropriate
max_size values
- Prefer other formats: Use JSON/MessagePack for untrusted data
- Regular updates: Keep Lodum updated for security patches
- Audit allowed types: Review
SafeUnpickler.find_class() for your needs
from lodum import pickle
# Good: Controlled size limit
data = pickle.load(MyClass, untrusted_data, max_size=1024*1024) # 1MB limit
# Good: Known source
data = pickle.load(MyClass, Path('/trusted/internal/cache.pkl'))
# Bad: Unlimited size from untrusted source
# data = pickle.load(MyClass, untrusted_data, max_size=float('inf'))