Why Msgpack?
From the README atREADME.md:11:
repod usesMsgpack was chosen over alternatives likeasyncioandmsgpackto asynchronously serialize network events and arbitrary data structures
rencode, pickle, and json for several reasons:
Binary efficiency
Binary efficiency
Msgpack is a binary format, so it’s more compact than JSON:
size_comparison.py
No Python-specific types
No Python-specific types
Unlike
pickle, msgpack is language-agnostic. You can build clients in JavaScript, Rust, C++, etc. and communicate with a Python server.cross_language.py
Safer than pickle
Safer than pickle
pickle is a security risk — it can execute arbitrary code during deserialization. Msgpack only deserializes data, not code.Faster than JSON
Faster than JSON
Msgpack is faster to encode/decode than JSON:
- No text parsing
- No base64 for binary data
- More compact representation
Better than rencode
Better than rencode
From the README at
README.md:156:It usesThe old PodSixNet library usedrencode/ custom delimiter-based framing (\0---\0), which is fragile with binary data
rencode with delimiter framing. This approach has problems:- Delimiter collision: what if your data contains
\0---\0? - Escaping overhead: need to escape delimiter bytes
- Not standard: custom format, hard to debug
- Standard (used by Kafka, Redis, Protocol Buffers, etc.)
- No collision risk
- No escaping needed
Msgpack Serialization
repod uses themsgpack Python library for encoding and decoding.
Encoding
Fromprotocol.py:37-62:
protocol.py
Serialize with msgpack
msgpack.packb(data, use_bin_type=True) converts the dictionary to bytes.The use_bin_type=True flag ensures bytes are encoded as msgpack’s bin type (not str).Pack length header
struct.pack(HEADER_FORMAT, len(packed)) encodes the length as a 4-byte big-endian integer.HEADER_FORMAT = ">I" means:>= big-endian (network byte order)I= unsigned int (4 bytes)
Decoding
Fromprotocol.py:65-88:
protocol.py
decode() function unpacks raw msgpack bytes. Parameters:
raw=False: Decode msgpackstrtype as Pythonstr(notbytes)strict_map_key=False: Allow non-string keys in dictionaries
Length-Prefix Framing
To safely delimit messages on a TCP stream, repod uses length-prefix framing.Wire Format
Fromprotocol.py:6-15:
protocol.py
- 4-byte length header (big-endian unsigned int)
- N-byte msgpack payload (where N = value from header)
Why Length-Prefix?
- Advantages
- Disadvantages
O(1) Boundary DetectionWith length-prefix framing, you know exactly where each message ends:No scanning required.No Delimiter CollisionDelimiter-based framing (like Length-prefix framing has no such issue — any binary data is safe.Standard and DebuggableLength-prefix framing is used by:
\0---\0) has a problem: what if your data contains the delimiter?- Protocol Buffers
- Kafka
- Redis (RESP3)
- PostgreSQL wire protocol
- Many others
Stream-Based Decoding
Theread_message() function implements buffering and frame extraction.
From protocol.py:91-124:
protocol.py
Check for header
If the buffer has less than 4 bytes, we can’t even read the length. Return
(None, 0) to indicate “need more data”.Check for complete frame
If the buffer has less than
4 + length bytes, the frame is incomplete. Return (None, 0).Usage in Read Loop
Fromchannel.py:203-225:
channel.py
This pattern ensures that:
- Partial frames are buffered until complete
- Multiple frames in one chunk are all processed
- No data is lost or duplicated
Supported Data Types
Msgpack supports these Python types:| Python Type | Msgpack Type | Notes |
|---|---|---|
None | nil | |
bool | bool | |
int | int | Up to 64-bit |
float | float | 64-bit (double) |
str | str | UTF-8 encoded |
bytes | bin | Binary data |
list | array | Heterogeneous |
tuple | array | Decoded as list |
dict | map | Keys can be any type |
Example: Complex Nested Data
complex_example.py
Performance Characteristics
Encoding Speed
Msgpack encoding is fast:- Small messages (< 100 bytes): ~500 ns per message
- Medium messages (100-1000 bytes): ~2-5 μs per message
- Large messages (> 1000 bytes): ~5-20 μs per message
Decoding Speed
Decoding is slightly slower than encoding:- Small messages: ~800 ns per message
- Medium messages: ~3-8 μs per message
- Large messages: ~8-30 μs per message
Wire Size
Msgpack is compact compared to JSON:size_comparison.py
Best Practices
Keep messages small
Keep messages small
Aim for messages < 1 KB. Large messages increase latency and memory usage.Instead, send incremental updates or use compression for large payloads.
Use bytes for binary data
Use bytes for binary data
If you have binary data (images, audio, etc.), use Msgpack’s
bytes:bin type is more efficient than base64.Avoid redundant data
Avoid redundant data
Don’t send the same data repeatedly. Use IDs to reference entities:
Consider message batching
Consider message batching
For high-frequency updates, batch multiple messages:This reduces framing overhead and system call overhead.
Debugging Wire Format
To inspect the raw bytes on the wire, use Wireshark or a packet capture tool.Example Packet Capture
debug.py
00 00 00 12= length header (18 bytes)82= msgpack map with 2 entriesa6= msgpack string of length 661 63 74 69 6f 6e= “action”a4= msgpack string of length 470 69 6e 67= “ping”a3= msgpack string of length 373 65 71= “seq”2a= msgpack positive fixint 42
Next Steps
Protocol API
Full API reference for encode(), decode(), and read_message()
Actions & Dispatch
Learn how messages are routed after deserialization
Performance Tips
Optimize your message serialization
Examples
See serialization in action