Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Neumenon/cowrie/llms.txt
Use this file to discover all available pages before exploring further.
Format Structure
Cowrie Gen2 uses a header-dictionary-value structure:Magic Bytes
Cowrie Gen2 files start with the ASCII characters"SJ" (Structured JSON):
Header Layout
Byte-by-Byte Breakdown
| Offset | Field | Type | Description |
|---|---|---|---|
| 0-1 | Magic | 2 bytes | 0x53 0x4A (“SJ”) |
| 2 | Version | u8 | Format version (0x02) |
| 3 | Flags | u8 | Bitfield (see below) |
Header Flags
Flags are bit-packed in byte 3:| Bit(s) | Name | Value | Description |
|---|---|---|---|
| 0 | Compressed | 0x01 | Payload is compressed |
| 1-2 | Compression Type | 0x02/0x04 | 0=none, 1=gzip, 2=zstd |
| 3 | Has Column Hints | 0x08 | Column hints present after flags |
| 4-7 | Reserved | - | Must be zero |
Varint Encoding
Cowrie uses Protocol Buffers-style unsigned varint encoding:- 7 bits per byte for data
- MSB (bit 7) as continuation flag:
1= more bytes follow0= last byte
Encoding Algorithm
Decoding Algorithm
Encoding Examples
| Value | Bytes | Explanation |
|---|---|---|
| 0 | 00 | Single byte, no continuation |
| 127 | 7F | Max single-byte value |
| 128 | 80 01 | 0b10000000 → 0b0000001 0000000 |
| 300 | AC 02 | 0b10101100 0b00000010 → 0b00000010 0101100 |
| 16,384 | 80 80 01 | 3 bytes |
Zigzag Encoding
Signed integers use zigzag encoding before varint encoding:Zigzag Mapping
| Signed | Zigzag | Explanation |
|---|---|---|
| 0 | 0 | Unchanged |
| -1 | 1 | Maps to 1 |
| 1 | 2 | Maps to 2 |
| -2 | 3 | Maps to 3 |
| 2 | 4 | Maps to 4 |
| -64 | 127 | Fits in 1 byte |
Zigzag encoding ensures small negative numbers (like -1, -2) encode as small varints, rather than large 64-bit values.
Type Tags
Every value is prefixed with a type tag (1 byte):Core Types (0x00-0x0F)
| Tag | Type | Encoding |
|---|---|---|
| 0x00 | Null | Tag only |
| 0x01 | False | Tag only |
| 0x02 | True | Tag only |
| 0x03 | Int64 | Tag + zigzag varint |
| 0x04 | Float64 | Tag + 8 bytes LE |
| 0x05 | String | Tag + len:varint + UTF-8 |
| 0x06 | Array | Tag + count:varint + elements |
| 0x07 | Object | Tag + count:varint + (dictIdx:varint + value)* |
| 0x08 | Bytes | Tag + len:varint + raw bytes |
| 0x09 | Uint64 | Tag + varint |
| 0x0A | Decimal128 | Tag + scale:i8 + coef:16 bytes |
| 0x0B | Datetime64 | Tag + nanos:i64 LE |
| 0x0C | UUID128 | Tag + 16 bytes |
| 0x0D | BigInt | Tag + len:varint + two’s complement bytes |
| 0x0E | Extension | Tag + extType:varint + len:varint + payload |
ML Types (0x20-0x2F)
| Tag | Type | Encoding |
|---|---|---|
| 0x20 | Tensor | dtype:u8 + rank:u8 + dims* + dataLen:varint + data |
| 0x21 | TensorRef | storeId:u8 + keyLen:varint + key |
| 0x22 | Image | format:u8 + width:u16 + height:u16 + dataLen:varint + data |
| 0x23 | Audio | encoding:u8 + sampleRate:u32 + channels:u8 + dataLen:varint + data |
Graph Types (0x30-0x39)
| Tag | Type | Encoding |
|---|---|---|
| 0x30 | AdjList | idWidth:u8 + nodeCount + edgeCount + rowOffsets + colIndices |
| 0x31 | RichText | text + flags:u8 + tokens + spans |
| 0x32 | Delta | baseId:varint + opCount:varint + ops |
| 0x35 | Node | id:string + labels* + props (dict-coded) |
| 0x36 | Edge | srcId + dstId + type + props (dict-coded) |
| 0x37 | NodeBatch | count:varint + Node[count] |
| 0x38 | EdgeBatch | count:varint + Edge[count] |
| 0x39 | GraphShard | nodes + edges + metadata (dict-coded) |
Encoding Examples
Null
Boolean
Int64
Float64
String
Array
Object (Dictionary-Coded)
Given dictionary["name", "age"]:
Compression Framing
When theCompressed flag (0x01) is set:
Compression Algorithm
Security Limits
Decoders must enforce limits to prevent memory exhaustion:| Limit | Default | Purpose |
|---|---|---|
| MaxDepth | 1,000 | Prevent stack overflow |
| MaxArrayLen | 100,000,000 | Limit array size |
| MaxObjectLen | 10,000,000 | Limit object fields |
| MaxStringLen | 500,000,000 | Limit string size (500 MB) |
| MaxBytesLen | 1,000,000,000 | Limit binary data (1 GB) |
| MaxDictLen | 10,000,000 | Limit dictionary entries |
| MaxExtLen | 100,000,000 | Limit extension payload |
| MaxRank | 32 | Limit tensor dimensions |
Column Hints
Optional metadata for columnar readers (appears after header if FlagHasColumnHints is set):Column hints enable zero-copy reading for columnar formats like Apache Arrow. Decoders that don’t support column hints must skip this block.
Error Handling
Canonical Error Codes
| Code | Condition |
|---|---|
| ERR_INVALID_MAGIC | First 2 bytes ≠ “SJ” |
| ERR_INVALID_VERSION | Version byte ≠ 0x02 |
| ERR_TRUNCATED | Unexpected end of data |
| ERR_INVALID_TAG | Unknown type tag |
| ERR_INVALID_UTF8 | String contains invalid UTF-8 |
| ERR_INVALID_VARINT | Varint overflow (>64 bits) |
| ERR_TOO_DEEP | Nesting depth > MaxDepth |
| ERR_TOO_LARGE | Array/object/string exceeds limits |
| ERR_DICT_TOO_LARGE | Dictionary > MaxDictLen |
| ERR_UNSUPPORTED_COMPRESSION | Unknown compression type |
| ERR_DECOMPRESSED_MISMATCH | Decompressed size ≠ OrigLen |
Validation Example
Summary
Cowrie’s wire format provides: ✅ Compact encoding with varint and zigzag for small numbers✅ Dictionary coding for 70-80% size reduction on repeated keys
✅ Type safety with explicit type tags
✅ Compression with gzip/zstd support
✅ Security with configurable limits
✅ Extensibility via TagExt envelope The format balances encoding efficiency with decoding speed, making it ideal for high-throughput ML/AI workloads.