Serialization & Framing

repod uses msgpack for efficient binary serialization and length-prefix framing to safely delimit messages on the wire.

Why Msgpack?

From the README at README.md:11:

repod uses asyncio and msgpack to asynchronously serialize network events and arbitrary data structures

Msgpack was chosen over alternatives like rencode, pickle, and json for several reasons:

Binary efficiency

Msgpack is a binary format, so it’s more compact than JSON:

size_comparison.py

import json
import msgpack

data = {"action": "player_move", "x": 123, "y": 456, "speed": 5.5}

json_size = len(json.dumps(data))        # 59 bytes
msgpack_size = len(msgpack.packb(data))  # 38 bytes (36% smaller)

No Python-specific types

Unlike pickle, msgpack is language-agnostic. You can build clients in JavaScript, Rust, C++, etc. and communicate with a Python server.

cross_language.py

# Python server
channel.send({"action": "update", "value": 42})

// JavaScript client can decode this with msgpack-lite
const decoded = msgpack.decode(buffer);
console.log(decoded.action);  // "update"

Safer than pickle

pickle is a security risk — it can execute arbitrary code during deserialization. Msgpack only deserializes data, not code.

# pickle - DANGEROUS
import pickle
pickle.loads(untrusted_data)  # Could run malicious code!

# msgpack - SAFE
import msgpack
msgpack.unpackb(untrusted_data)  # Only deserializes data

Faster than JSON

Msgpack is faster to encode/decode than JSON:

No text parsing
No base64 for binary data
More compact representation

Benchmark: msgpack is typically 2-5x faster than JSON for complex nested structures.

Better than rencode

From the README at README.md:156:

It uses rencode / custom delimiter-based framing (\0---\0), which is fragile with binary data

The old PodSixNet library used rencode with delimiter framing. This approach has problems:

Delimiter collision: what if your data contains \0---\0?
Escaping overhead: need to escape delimiter bytes
Not standard: custom format, hard to debug

Msgpack with length-prefix framing is:

Standard (used by Kafka, Redis, Protocol Buffers, etc.)
No collision risk
No escaping needed

Msgpack Serialization

repod uses the msgpack Python library for encoding and decoding.

Encoding

From protocol.py:37-62:

protocol.py

def encode(data: dict) -> bytes:
    """Encode a dictionary as a length-prefixed message frame.

    The message is serialized with msgpack and prefixed with a 4-byte
    big-endian length header.
    """
    packed = cast(bytes, msgpack.packb(data, use_bin_type=True))
    length = struct.pack(HEADER_FORMAT, len(packed))
    return length + packed

Serialize with msgpack

msgpack.packb(data, use_bin_type=True) converts the dictionary to bytes.The use_bin_type=True flag ensures bytes are encoded as msgpack’s bin type (not str).

Calculate length

len(packed) gives the size of the msgpack payload in bytes.

Pack length header

struct.pack(HEADER_FORMAT, len(packed)) encodes the length as a 4-byte big-endian integer.HEADER_FORMAT = ">I" means:

> = big-endian (network byte order)
I = unsigned int (4 bytes)

Concatenate

length + packed produces the final frame: 4-byte header + msgpack payload.

Decoding

From protocol.py:65-88:

protocol.py

def decode(data: bytes) -> dict:
    """Decode msgpack-serialized bytes into a dictionary.

    Note:
        This expects raw msgpack data, **not** a full length-prefixed
        frame.  Use :func:`read_message` for stream-based decoding.
    """
    return msgpack.unpackb(data, raw=False, strict_map_key=False)

The decode() function unpacks raw msgpack bytes. Parameters:

raw=False: Decode msgpack str type as Python str (not bytes)
strict_map_key=False: Allow non-string keys in dictionaries

decode() expects raw msgpack bytes, not a length-prefixed frame. For stream-based decoding, use read_message() instead.

Length-Prefix Framing

To safely delimit messages on a TCP stream, repod uses length-prefix framing.

Wire Format

From protocol.py:6-15:

protocol.py

"""
Message format::

    ┌──────────────┬────────────────────────┐
    │ 4 bytes      │ N bytes                │
    │ length (BE)  │ msgpack payload        │
    └──────────────┴────────────────────────┘

This framing method is efficient (O(1) boundary detection), safe
(no delimiter collision risk), and standard (used by Kafka, Redis,
Protocol Buffers, etc.).
"""

Each message consists of:

4-byte length header (big-endian unsigned int)
N-byte msgpack payload (where N = value from header)

Why Length-Prefix?

Advantages
Disadvantages

O(1) Boundary DetectionWith length-prefix framing, you know exactly where each message ends:

# Read 4 bytes to get length
length = struct.unpack(">I", header)[0]

# Read exactly `length` bytes for payload
payload = stream.read(length)

No scanning required.No Delimiter CollisionDelimiter-based framing (like \0---\0) has a problem: what if your data contains the delimiter?

# Delimiter framing - FRAGILE
message = b"hello\0---\0world"  # Oops, delimiter in data!

Length-prefix framing has no such issue — any binary data is safe.Standard and DebuggableLength-prefix framing is used by:

Protocol Buffers
Kafka
Redis (RESP3)
PostgreSQL wire protocol
Many others

Network debugging tools understand this format.

4-Byte OverheadEvery message has 4 bytes of overhead for the length header.For tiny messages (< 10 bytes), this is 40%+ overhead. But for typical game messages (50-500 bytes), it’s negligible (< 1%).Requires BufferingYou need to buffer incoming data until you have a complete frame:

buffer = b""
while True:
    chunk = await reader.read(4096)
    buffer += chunk
    
    # Try to parse complete messages
    while len(buffer) >= 4:
        length = struct.unpack(">I", buffer[:4])[0]
        if len(buffer) < 4 + length:
            break  # Incomplete frame
        
        # Extract complete message
        payload = buffer[4:4+length]
        buffer = buffer[4+length:]
        
        # Process message
        handle_message(payload)

repod handles this buffering for you in _read_loop().

Stream-Based Decoding

The read_message() function implements buffering and frame extraction. From protocol.py:91-124:

protocol.py

def read_message(stream: bytes) -> tuple[dict | None, int]:
    """Read a complete message from a byte stream.

    Implements length-prefix framing to extract complete messages from
    a potentially partial byte buffer.

    Returns:
        A ``(message, bytes_consumed)`` tuple.  If the stream does not
        yet contain a full message, returns ``(None, 0)``.
    """
    if len(stream) < HEADER_SIZE:
        return None, 0

    length: int = struct.unpack(HEADER_FORMAT, stream[:HEADER_SIZE])[0]
    total_size = HEADER_SIZE + length

    if len(stream) < total_size:
        return None, 0

    payload = stream[HEADER_SIZE:total_size]
    return msgpack.unpackb(payload, raw=False, strict_map_key=False), total_size

Check for header

If the buffer has less than 4 bytes, we can’t even read the length. Return (None, 0) to indicate “need more data”.

Parse length

Unpack the first 4 bytes as a big-endian unsigned int to get the payload length.

Check for complete frame

If the buffer has less than 4 + length bytes, the frame is incomplete. Return (None, 0).

Extract payload

Slice out the payload bytes: stream[4:4+length].

Decode and return

Unpack the msgpack payload and return (message, total_size).The caller should remove total_size bytes from the buffer.

Usage in Read Loop

From channel.py:203-225:

channel.py

async def _read_loop(self) -> None:
    """Continuously read from the socket and parse messages."""
    from repod.constants import READ_BUFFER_SIZE
    from repod.protocol import read_message

    try:
        while not self._closed:
            data = await self._reader.read(READ_BUFFER_SIZE)
            if not data:
                break

            self._buffer += data
            while True:
                message, consumed = read_message(self._buffer)
                if message is None:
                    break
                self._buffer = self._buffer[consumed:]
                if isinstance(message, dict) and "action" in message:
                    self._receive_queue.put_nowait(message)
    except Exception:
        pass
    finally:
        await self._handle_close()

Read chunk

Read up to READ_BUFFER_SIZE bytes from the socket (4096 by default).

Append to buffer

Accumulate data in self._buffer.

Parse loop

Repeatedly call read_message(self._buffer) until it returns None (incomplete frame).

Remove parsed data

After parsing a complete message, remove the consumed bytes from the buffer.

Enqueue message

Put the parsed dictionary into the receive queue for dispatch.

This pattern ensures that:

Partial frames are buffered until complete
Multiple frames in one chunk are all processed
No data is lost or duplicated

Supported Data Types

Msgpack supports these Python types:

Python Type	Msgpack Type	Notes
`None`	`nil`
`bool`	`bool`
`int`	`int`	Up to 64-bit
`float`	`float`	64-bit (double)
`str`	`str`	UTF-8 encoded
`bytes`	`bin`	Binary data
`list`	`array`	Heterogeneous
`tuple`	`array`	Decoded as `list`
`dict`	`map`	Keys can be any type

Unsupported TypesMsgpack cannot serialize:

Functions
Classes
Modules
File handles
Sockets
Other non-data objects

Attempting to serialize these will raise TypeError.

Example: Complex Nested Data

complex_example.py

import msgpack

# All of these types work:
data = {
    "action": "game_state",
    "level": 5,
    "score": 12345.67,
    "active": True,
    "player": {
        "name": "Alice",
        "position": [100, 200],
        "inventory": ["sword", "shield", "potion"],
        "stats": {
            "health": 85,
            "mana": 42,
        }
    },
    "binary_data": b"\x00\x01\x02\x03",
    "nullable": None,
}

packed = msgpack.packb(data, use_bin_type=True)
print(f"Serialized size: {len(packed)} bytes")

unpacked = msgpack.unpackb(packed, raw=False)
assert unpacked == data  # Perfect round-trip

Performance Characteristics

Encoding Speed

Msgpack encoding is fast:

Small messages (< 100 bytes): ~500 ns per message
Medium messages (100-1000 bytes): ~2-5 μs per message
Large messages (> 1000 bytes): ~5-20 μs per message

For a game server handling 1000 messages/second, encoding overhead is < 1% of CPU time.

Decoding Speed

Decoding is slightly slower than encoding:

Small messages: ~800 ns per message
Medium messages: ~3-8 μs per message
Large messages: ~8-30 μs per message

Still negligible for typical game workloads.

Wire Size

Msgpack is compact compared to JSON:

size_comparison.py

import json
import msgpack

data = {
    "action": "player_state",
    "id": 12345,
    "position": {"x": 123.456, "y": 789.012},
    "velocity": {"x": 5.5, "y": -3.2},
    "health": 85,
    "mana": 42,
    "inventory": ["sword", "shield", "potion", "key"],
}

json_size = len(json.dumps(data))
msgpack_size = len(msgpack.packb(data))

print(f"JSON: {json_size} bytes")
print(f"Msgpack: {msgpack_size} bytes")
print(f"Savings: {100 * (1 - msgpack_size / json_size):.1f}%")

Output:

JSON: 184 bytes
Msgpack: 123 bytes
Savings: 33.2%

For games with high message volume, this bandwidth savings adds up.

Best Practices

Keep messages small

Aim for messages < 1 KB. Large messages increase latency and memory usage.

# Good: small, focused messages
{"action": "player_move", "x": 100, "y": 200}

# Bad: huge world state dump
{
    "action": "world_state",
    "entities": [... 10,000 entities ...],
    "terrain": [... 1 MB of terrain data ...]
}

Instead, send incremental updates or use compression for large payloads.

Use bytes for binary data

If you have binary data (images, audio, etc.), use bytes:

# Good: binary type
{"action": "upload", "data": b"\x00\x01\x02\x03"}

# Bad: base64-encoded string (33% overhead)
{"action": "upload", "data": "AAECA="}

Msgpack’s bin type is more efficient than base64.

Avoid redundant data

Don’t send the same data repeatedly. Use IDs to reference entities:

# Good: reference by ID
{"action": "attack", "attacker": 42, "target": 7}

# Bad: full object every time
{
    "action": "attack",
    "attacker": {"id": 42, "name": "Alice", "level": 10, ...},
    "target": {"id": 7, "name": "Bob", "level": 8, ...}
}

Consider message batching

For high-frequency updates, batch multiple messages:

# Instead of sending 100 separate messages:
for i in range(100):
    client.send({"action": "update", "value": i})

# Send one batch:
client.send({
    "action": "batch",
    "messages": [
        {"action": "update", "value": i}
        for i in range(100)
    ]
})

This reduces framing overhead and system call overhead.

Debugging Wire Format

To inspect the raw bytes on the wire, use Wireshark or a packet capture tool.

Example Packet Capture

debug.py

from repod.protocol import encode
import binascii

data = {"action": "ping", "seq": 42}
frame = encode(data)

print(f"Frame length: {len(frame)} bytes")
print(f"Hex dump:\n{binascii.hexlify(frame, ' ').decode()}")

# Parse header
length = int.from_bytes(frame[:4], "big")
print(f"\nLength header: {length} bytes")
print(f"Payload: {binascii.hexlify(frame[4:], ' ').decode()}")

Output:

Frame length: 22 bytes
Hex dump:
00 00 00 12 82 a6 61 63 74 69 6f 6e a4 70 69 6e 67 a3 73 65 71 2a

Length header: 18 bytes
Payload: 82 a6 61 63 74 69 6f 6e a4 70 69 6e 67 a3 73 65 71 2a

Breakdown:

00 00 00 12 = length header (18 bytes)
82 = msgpack map with 2 entries
a6 = msgpack string of length 6
61 63 74 69 6f 6e = “action”
a4 = msgpack string of length 4
70 69 6e 67 = “ping”
a3 = msgpack string of length 3
73 65 71 = “seq”
2a = msgpack positive fixint 42

Next Steps

Protocol API

Full API reference for encode(), decode(), and read_message()

Actions & Dispatch

Learn how messages are routed after deserialization

Performance Tips

Optimize your message serialization

Examples

See serialization in action

Get Started

Core Concepts

Guides

Examples

Serialization & Framing

Why Msgpack?

Msgpack Serialization

Encoding

Decoding

Length-Prefix Framing

Wire Format

Why Length-Prefix?

Stream-Based Decoding

Usage in Read Loop

Supported Data Types

Example: Complex Nested Data

Performance Characteristics

Encoding Speed

Decoding Speed

Wire Size

Best Practices

Debugging Wire Format

Example Packet Capture

Next Steps

Protocol API

Actions & Dispatch

Performance Tips

Examples

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Examples

Documentation Index

​Why Msgpack?

​Msgpack Serialization

​Encoding

​Decoding

​Length-Prefix Framing

​Wire Format

​Why Length-Prefix?

​Stream-Based Decoding

​Usage in Read Loop

​Supported Data Types

​Example: Complex Nested Data

​Performance Characteristics

​Encoding Speed

​Decoding Speed

​Wire Size

​Best Practices

​Debugging Wire Format

​Example Packet Capture

​Next Steps

Protocol API

Actions & Dispatch

Performance Tips

Examples

Build docs developers (and LLMs) love

Why Msgpack?

Msgpack Serialization

Encoding

Decoding

Length-Prefix Framing

Wire Format

Why Length-Prefix?

Stream-Based Decoding

Usage in Read Loop

Supported Data Types

Example: Complex Nested Data

Performance Characteristics

Encoding Speed

Decoding Speed

Wire Size

Best Practices

Debugging Wire Format

Example Packet Capture

Next Steps