Skip to main content
repod uses msgpack for efficient binary serialization and length-prefix framing to safely delimit messages on the wire.

Why Msgpack?

From the README at README.md:11:
repod uses asyncio and msgpack to asynchronously serialize network events and arbitrary data structures
Msgpack was chosen over alternatives like rencode, pickle, and json for several reasons:
Msgpack is a binary format, so it’s more compact than JSON:
size_comparison.py
import json
import msgpack

data = {"action": "player_move", "x": 123, "y": 456, "speed": 5.5}

json_size = len(json.dumps(data))        # 59 bytes
msgpack_size = len(msgpack.packb(data))  # 38 bytes (36% smaller)
Unlike pickle, msgpack is language-agnostic. You can build clients in JavaScript, Rust, C++, etc. and communicate with a Python server.
cross_language.py
# Python server
channel.send({"action": "update", "value": 42})

// JavaScript client can decode this with msgpack-lite
const decoded = msgpack.decode(buffer);
console.log(decoded.action);  // "update"
pickle is a security risk — it can execute arbitrary code during deserialization. Msgpack only deserializes data, not code.
# pickle - DANGEROUS
import pickle
pickle.loads(untrusted_data)  # Could run malicious code!

# msgpack - SAFE
import msgpack
msgpack.unpackb(untrusted_data)  # Only deserializes data
Msgpack is faster to encode/decode than JSON:
  • No text parsing
  • No base64 for binary data
  • More compact representation
Benchmark: msgpack is typically 2-5x faster than JSON for complex nested structures.
From the README at README.md:156:
It uses rencode / custom delimiter-based framing (\0---\0), which is fragile with binary data
The old PodSixNet library used rencode with delimiter framing. This approach has problems:
  • Delimiter collision: what if your data contains \0---\0?
  • Escaping overhead: need to escape delimiter bytes
  • Not standard: custom format, hard to debug
Msgpack with length-prefix framing is:
  • Standard (used by Kafka, Redis, Protocol Buffers, etc.)
  • No collision risk
  • No escaping needed

Msgpack Serialization

repod uses the msgpack Python library for encoding and decoding.

Encoding

From protocol.py:37-62:
protocol.py
def encode(data: dict) -> bytes:
    """Encode a dictionary as a length-prefixed message frame.

    The message is serialized with msgpack and prefixed with a 4-byte
    big-endian length header.
    """
    packed = cast(bytes, msgpack.packb(data, use_bin_type=True))
    length = struct.pack(HEADER_FORMAT, len(packed))
    return length + packed
1

Serialize with msgpack

msgpack.packb(data, use_bin_type=True) converts the dictionary to bytes.The use_bin_type=True flag ensures bytes are encoded as msgpack’s bin type (not str).
2

Calculate length

len(packed) gives the size of the msgpack payload in bytes.
3

Pack length header

struct.pack(HEADER_FORMAT, len(packed)) encodes the length as a 4-byte big-endian integer.HEADER_FORMAT = ">I" means:
  • > = big-endian (network byte order)
  • I = unsigned int (4 bytes)
4

Concatenate

length + packed produces the final frame: 4-byte header + msgpack payload.

Decoding

From protocol.py:65-88:
protocol.py
def decode(data: bytes) -> dict:
    """Decode msgpack-serialized bytes into a dictionary.

    Note:
        This expects raw msgpack data, **not** a full length-prefixed
        frame.  Use :func:`read_message` for stream-based decoding.
    """
    return msgpack.unpackb(data, raw=False, strict_map_key=False)
The decode() function unpacks raw msgpack bytes. Parameters:
  • raw=False: Decode msgpack str type as Python str (not bytes)
  • strict_map_key=False: Allow non-string keys in dictionaries
decode() expects raw msgpack bytes, not a length-prefixed frame. For stream-based decoding, use read_message() instead.

Length-Prefix Framing

To safely delimit messages on a TCP stream, repod uses length-prefix framing.

Wire Format

From protocol.py:6-15:
protocol.py
"""
Message format::

    ┌──────────────┬────────────────────────┐
    │ 4 bytes      │ N bytes                │
    │ length (BE)  │ msgpack payload        │
    └──────────────┴────────────────────────┘

This framing method is efficient (O(1) boundary detection), safe
(no delimiter collision risk), and standard (used by Kafka, Redis,
Protocol Buffers, etc.).
"""
Each message consists of:
  1. 4-byte length header (big-endian unsigned int)
  2. N-byte msgpack payload (where N = value from header)

Why Length-Prefix?

O(1) Boundary DetectionWith length-prefix framing, you know exactly where each message ends:
# Read 4 bytes to get length
length = struct.unpack(">I", header)[0]

# Read exactly `length` bytes for payload
payload = stream.read(length)
No scanning required.No Delimiter CollisionDelimiter-based framing (like \0---\0) has a problem: what if your data contains the delimiter?
# Delimiter framing - FRAGILE
message = b"hello\0---\0world"  # Oops, delimiter in data!
Length-prefix framing has no such issue — any binary data is safe.Standard and DebuggableLength-prefix framing is used by:
  • Protocol Buffers
  • Kafka
  • Redis (RESP3)
  • PostgreSQL wire protocol
  • Many others
Network debugging tools understand this format.

Stream-Based Decoding

The read_message() function implements buffering and frame extraction. From protocol.py:91-124:
protocol.py
def read_message(stream: bytes) -> tuple[dict | None, int]:
    """Read a complete message from a byte stream.

    Implements length-prefix framing to extract complete messages from
    a potentially partial byte buffer.

    Returns:
        A ``(message, bytes_consumed)`` tuple.  If the stream does not
        yet contain a full message, returns ``(None, 0)``.
    """
    if len(stream) < HEADER_SIZE:
        return None, 0

    length: int = struct.unpack(HEADER_FORMAT, stream[:HEADER_SIZE])[0]
    total_size = HEADER_SIZE + length

    if len(stream) < total_size:
        return None, 0

    payload = stream[HEADER_SIZE:total_size]
    return msgpack.unpackb(payload, raw=False, strict_map_key=False), total_size
1

Check for header

If the buffer has less than 4 bytes, we can’t even read the length. Return (None, 0) to indicate “need more data”.
2

Parse length

Unpack the first 4 bytes as a big-endian unsigned int to get the payload length.
3

Check for complete frame

If the buffer has less than 4 + length bytes, the frame is incomplete. Return (None, 0).
4

Extract payload

Slice out the payload bytes: stream[4:4+length].
5

Decode and return

Unpack the msgpack payload and return (message, total_size).The caller should remove total_size bytes from the buffer.

Usage in Read Loop

From channel.py:203-225:
channel.py
async def _read_loop(self) -> None:
    """Continuously read from the socket and parse messages."""
    from repod.constants import READ_BUFFER_SIZE
    from repod.protocol import read_message

    try:
        while not self._closed:
            data = await self._reader.read(READ_BUFFER_SIZE)
            if not data:
                break

            self._buffer += data
            while True:
                message, consumed = read_message(self._buffer)
                if message is None:
                    break
                self._buffer = self._buffer[consumed:]
                if isinstance(message, dict) and "action" in message:
                    self._receive_queue.put_nowait(message)
    except Exception:
        pass
    finally:
        await self._handle_close()
1

Read chunk

Read up to READ_BUFFER_SIZE bytes from the socket (4096 by default).
2

Append to buffer

Accumulate data in self._buffer.
3

Parse loop

Repeatedly call read_message(self._buffer) until it returns None (incomplete frame).
4

Remove parsed data

After parsing a complete message, remove the consumed bytes from the buffer.
5

Enqueue message

Put the parsed dictionary into the receive queue for dispatch.
This pattern ensures that:
  • Partial frames are buffered until complete
  • Multiple frames in one chunk are all processed
  • No data is lost or duplicated

Supported Data Types

Msgpack supports these Python types:
Python TypeMsgpack TypeNotes
Nonenil
boolbool
intintUp to 64-bit
floatfloat64-bit (double)
strstrUTF-8 encoded
bytesbinBinary data
listarrayHeterogeneous
tuplearrayDecoded as list
dictmapKeys can be any type
Unsupported TypesMsgpack cannot serialize:
  • Functions
  • Classes
  • Modules
  • File handles
  • Sockets
  • Other non-data objects
Attempting to serialize these will raise TypeError.

Example: Complex Nested Data

complex_example.py
import msgpack

# All of these types work:
data = {
    "action": "game_state",
    "level": 5,
    "score": 12345.67,
    "active": True,
    "player": {
        "name": "Alice",
        "position": [100, 200],
        "inventory": ["sword", "shield", "potion"],
        "stats": {
            "health": 85,
            "mana": 42,
        }
    },
    "binary_data": b"\x00\x01\x02\x03",
    "nullable": None,
}

packed = msgpack.packb(data, use_bin_type=True)
print(f"Serialized size: {len(packed)} bytes")

unpacked = msgpack.unpackb(packed, raw=False)
assert unpacked == data  # Perfect round-trip

Performance Characteristics

Encoding Speed

Msgpack encoding is fast:
  • Small messages (< 100 bytes): ~500 ns per message
  • Medium messages (100-1000 bytes): ~2-5 μs per message
  • Large messages (> 1000 bytes): ~5-20 μs per message
For a game server handling 1000 messages/second, encoding overhead is < 1% of CPU time.

Decoding Speed

Decoding is slightly slower than encoding:
  • Small messages: ~800 ns per message
  • Medium messages: ~3-8 μs per message
  • Large messages: ~8-30 μs per message
Still negligible for typical game workloads.

Wire Size

Msgpack is compact compared to JSON:
size_comparison.py
import json
import msgpack

data = {
    "action": "player_state",
    "id": 12345,
    "position": {"x": 123.456, "y": 789.012},
    "velocity": {"x": 5.5, "y": -3.2},
    "health": 85,
    "mana": 42,
    "inventory": ["sword", "shield", "potion", "key"],
}

json_size = len(json.dumps(data))
msgpack_size = len(msgpack.packb(data))

print(f"JSON: {json_size} bytes")
print(f"Msgpack: {msgpack_size} bytes")
print(f"Savings: {100 * (1 - msgpack_size / json_size):.1f}%")
Output:
JSON: 184 bytes
Msgpack: 123 bytes
Savings: 33.2%
For games with high message volume, this bandwidth savings adds up.

Best Practices

Aim for messages < 1 KB. Large messages increase latency and memory usage.
# Good: small, focused messages
{"action": "player_move", "x": 100, "y": 200}

# Bad: huge world state dump
{
    "action": "world_state",
    "entities": [... 10,000 entities ...],
    "terrain": [... 1 MB of terrain data ...]
}
Instead, send incremental updates or use compression for large payloads.
If you have binary data (images, audio, etc.), use bytes:
# Good: binary type
{"action": "upload", "data": b"\x00\x01\x02\x03"}

# Bad: base64-encoded string (33% overhead)
{"action": "upload", "data": "AAECA="}
Msgpack’s bin type is more efficient than base64.
Don’t send the same data repeatedly. Use IDs to reference entities:
# Good: reference by ID
{"action": "attack", "attacker": 42, "target": 7}

# Bad: full object every time
{
    "action": "attack",
    "attacker": {"id": 42, "name": "Alice", "level": 10, ...},
    "target": {"id": 7, "name": "Bob", "level": 8, ...}
}
For high-frequency updates, batch multiple messages:
# Instead of sending 100 separate messages:
for i in range(100):
    client.send({"action": "update", "value": i})

# Send one batch:
client.send({
    "action": "batch",
    "messages": [
        {"action": "update", "value": i}
        for i in range(100)
    ]
})
This reduces framing overhead and system call overhead.

Debugging Wire Format

To inspect the raw bytes on the wire, use Wireshark or a packet capture tool.

Example Packet Capture

debug.py
from repod.protocol import encode
import binascii

data = {"action": "ping", "seq": 42}
frame = encode(data)

print(f"Frame length: {len(frame)} bytes")
print(f"Hex dump:\n{binascii.hexlify(frame, ' ').decode()}")

# Parse header
length = int.from_bytes(frame[:4], "big")
print(f"\nLength header: {length} bytes")
print(f"Payload: {binascii.hexlify(frame[4:], ' ').decode()}")
Output:
Frame length: 22 bytes
Hex dump:
00 00 00 12 82 a6 61 63 74 69 6f 6e a4 70 69 6e 67 a3 73 65 71 2a

Length header: 18 bytes
Payload: 82 a6 61 63 74 69 6f 6e a4 70 69 6e 67 a3 73 65 71 2a
Breakdown:
  • 00 00 00 12 = length header (18 bytes)
  • 82 = msgpack map with 2 entries
  • a6 = msgpack string of length 6
  • 61 63 74 69 6f 6e = “action”
  • a4 = msgpack string of length 4
  • 70 69 6e 67 = “ping”
  • a3 = msgpack string of length 3
  • 73 65 71 = “seq”
  • 2a = msgpack positive fixint 42

Next Steps

Protocol API

Full API reference for encode(), decode(), and read_message()

Actions & Dispatch

Learn how messages are routed after deserialization

Performance Tips

Optimize your message serialization

Examples

See serialization in action

Build docs developers (and LLMs) love