Security Limits

Cowrie decoders enforce security limits to prevent denial-of-service attacks, memory exhaustion, and CPU spin attacks. These limits provide defense-in-depth protection beyond basic sanity checks.

Overview

Security limits are enforced at decode time and can be customized via DecodeOptions. Default limits are designed to support large ML workloads while preventing extreme allocations.

Two-Layer Protection

Sanity Checks (always enforced): Length cannot exceed remaining data
Security Limits (configurable): Absolute maximums even for well-formed data

// Example: Decoding a string
length := readVarint()  // Attacker claims 1GB

// Layer 1: Sanity check
if length > remaining_bytes {
    return ErrMalformedLength  // Fail fast
}

// Layer 2: Security limit
if length > MaxStringLen {
    return ErrStringTooLarge  // Prevent legitimate but huge allocation
}

data := read(length)  // Safe to allocate

Default Limits

const (
    DefaultMaxDepth     = 1000          // Maximum nesting depth
    DefaultMaxArrayLen  = 100_000_000   // 100M elements
    DefaultMaxObjectLen = 10_000_000    // 10M fields
    DefaultMaxStringLen = 500_000_000   // 500MB strings
    DefaultMaxBytesLen  = 1_000_000_000 // 1GB bytes (tensors, images, audio)
    DefaultMaxExtLen    = 100_000_000   // 100MB max extension payload
    DefaultMaxDictLen   = 10_000_000    // 10M dictionary entries
    DefaultMaxHintCount = 10_000        // 10K column hints
    DefaultMaxRank      = 32            // Maximum tensor rank
)

These defaults support real ML workloads:

768-dim embeddings: ~3KB per embedding → 32M embeddings fit in MaxBytesLen
Large language model responses: Multi-paragraph text fits in MaxStringLen
Graph databases: Millions of nodes/edges fit in MaxArrayLen

DecodeOptions

Configure limits for your use case:

import "github.com/Neumenon/cowrie"

// Use defaults
val, err := cowrie.Decode(data)

// Custom limits
opts := cowrie.DecodeOptions{
    MaxDepth:     500,              // Limit nesting (JSON bomb protection)
    MaxArrayLen:  1_000_000,        // Limit array size
    MaxObjectLen: 100_000,          // Limit object fields
    MaxStringLen: 10_000_000,       // 10MB strings
    MaxBytesLen:  100_000_000,      // 100MB binary data
    MaxExtLen:    50_000_000,       // 50MB extensions
    MaxDictLen:   1_000_000,        // 1M dictionary keys
    MaxHintCount: 1_000,            // 1K column hints
    MaxRank:      16,               // 16D tensors max
}
val, err := cowrie.DecodeWithOptions(data, opts)

Zero Values Use Defaults

opts := cowrie.DecodeOptions{
    MaxDepth: 100,  // Override
    // MaxArrayLen: 0 → Uses DefaultMaxArrayLen (100M)
}

Unlimited (Not Recommended)

opts := cowrie.DecodeOptions{
    MaxDepth: -1,  // Unlimited (DANGEROUS!)
}

Only use unlimited for trusted input (e.g., internal files).

Limit Descriptions

MaxDepth

Protects against: Nested structure attacks (stack overflow, CPU spin)

// Attack: 1000 levels deep
{"a": {"a": {"a": {"a": ...}}}}

Default: 1000 levels (enough for legitimate data) Typical values:

APIs: 50-100 (shallow documents)
Databases: 500-1000 (complex objects)
File processing: 1000+ (deeply nested config)

MaxArrayLen

Protects against: Memory exhaustion via huge arrays

// Attack: Claim 1B elements (8GB+ allocation)
Tag(0x06) | count:varint(1000000000) | ...

Default: 100M elements Typical values:

APIs: 1M-10M (paginated responses)
ML workloads: 100M+ (large embedding batches)
Graphs: 100M+ (large node/edge batches)

Memory impact:

100M int64: ~800MB
100M float32: ~400MB
100M strings: Variable (depends on content)

MaxObjectLen

Protects against: Memory exhaustion via huge objects

// Attack: 10M fields (massive dictionary + object overhead)
Tag(0x07) | count:varint(10000000) | ...

Default: 10M fields Typical values:

APIs: 1K-10K fields (reasonable documents)
Databases: 100K-1M fields (wide tables)
Analytics: 10M+ fields (event aggregations)

Memory impact:

10M fields × 32 bytes/field ≈ 320MB overhead
Plus dictionary keys (encoded once)
Plus field values (varies)

MaxStringLen

Protects against: Memory exhaustion via huge strings

// Attack: 1GB string
Tag(0x05) | len:varint(1000000000) | ...

Default: 500MB Typical values:

APIs: 1MB-10MB (documents, logs)
LLM responses: 100MB-500MB (long-form generation)
Files: 500MB+ (processing large text)

Why 500MB? Supports GPT-4 max context (~200K tokens × 4 bytes ≈ 800KB UTF-8, but with long-form responses can be multi-MB).

MaxBytesLen

Protects against: Memory exhaustion via binary data (tensors, images, audio)

// Attack: 10GB tensor
Tag(0x20) | ... | dataLen:varint(10000000000) | ...

Default: 1GB Typical values:

APIs: 10MB-100MB (small images, embeddings)
ML workloads: 1GB+ (large tensors, batches)
Media: 100MB-1GB+ (high-res images, audio)

Examples:

768-dim float32 embedding: 3KB
10K embeddings: 30MB
1M embeddings: 3GB (exceeds default!)
1920×1080 JPEG: ~1MB
4K raw RGB: 24MB

MaxExtLen

Protects against: Unknown extension payload attacks

// Attack: 1GB unknown extension
Tag(0x0E) | extType:varint | len:varint(1000000000) | ...

Default: 100MB Typical values:

Standard: 10MB-100MB (forward compatibility)
Strict: 1MB (reject large unknown data)

MaxDictLen

Protects against: Dictionary explosion (CPU spin, memory)

// Attack: 10M dictionary keys
DictLen:varint(10000000) | (len:varint | bytes)* | ...

Default: 10M entries (same as MaxObjectLen) Typical values:

APIs: 1K-10K keys (typical schemas)
Large objects: 1M-10M keys (wide tables, many graphs)

Memory impact:

10M keys × 20 bytes avg ≈ 200MB dictionary
Plus hash map overhead

MaxHintCount

Protects against: Column hints CPU spin attack

// Attack: 1M column hints (causes long parsing time)
HintCount:varint(1000000) | (field + type + shape + flags)* | ...

Default: 10K hints Typical values:

Standard: 100-1000 columns (wide tables)
Large: 10K+ columns (ultra-wide analytics)

MaxRank

Protects against: Tensor dimension explosion

// Attack: 255 dimensions (causes huge offset calculations)
Tag(0x20) | dtype | rank:u8(255) | dims:varint*255 | ...

Default: 32 dimensions Typical values:

Standard ML: 4-8 dimensions (batches, channels, height, width, etc.)
Advanced: 16-32 dimensions (attention heads, multiple batches)

Why 32? Enough for complex architectures:

4D: [batch, channels, height, width]
6D: [batch, time, layers, heads, seq, hidden]
32D: Extreme multi-dimensional tensors

Wire limit: u8 max = 255 dimensions (but decoder rejects > MaxRank)

Attack Scenarios

1. Nested Object Bomb

Attack: Deeply nested objects to exhaust stack or spin CPU

{"a":{"a":{"a":{"a": ... 10000 levels}}}}

Protection: MaxDepth limit

opts := cowrie.DecodeOptions{MaxDepth: 100}
_, err := cowrie.DecodeWithOptions(malicious, opts)
// err == cowrie.ErrDepthExceeded

2. Array Length Bomb

Attack: Claim huge array to allocate gigabytes

Tag(0x06) | count:varint(1000000000) | ...

Protection: MaxArrayLen + sanity check

// Decoder checks:
if count > MaxArrayLen {
    return ErrArrayTooLarge  // Security limit
}
if count > remaining_bytes {
    return ErrMalformedLength  // Sanity check
}

3. Dictionary Explosion

Attack: 10M dictionary keys to exhaust memory + CPU

DictLen:varint(10000000) | key1 | key2 | ... | key10M | ...

Protection: MaxDictLen + sanity check

if dictLen > MaxDictLen {
    return ErrDictTooLarge
}
if dictLen > remaining_bytes {
    return ErrMalformedLength
}

4. Decompression Bomb

Attack: 1KB compressed → 10GB decompressed

Flags:0x03 (compressed gzip) | OrigLen:varint(10000000000) | [1KB of compressed data]

Protection: MaxDecompressedSize limit

const MaxDecompressedSize = 256 * 1024 * 1024  // 256MB

limited := io.LimitReader(gzipReader, MaxDecompressedSize+1)
out, _ := io.ReadAll(limited)
if len(out) > MaxDecompressedSize {
    return cowrie.ErrDecompressedTooLarge
}

See Compression for details.

5. Tensor Rank Bomb

Attack: 255-dimensional tensor to cause overflow in size calculations

Tag(0x20) | dtype | rank:u8(255) | dims:[1,1,1,...,1] | dataLen:varint(1) | [1 byte]

Protection: MaxRank limit

if rank > MaxRank {
    return ErrMalformedLength
}

6. Column Hints CPU Spin

Attack: 1M column hints to slow down header parsing

FlagHasColumnHints | HintCount:varint(1000000) | (field + type + shape + flags)*1M | ...

Protection: MaxHintCount limit

if hintCount > MaxHintCount {
    return ErrTooManyHints
}

Error Handling

All limit violations return specific errors:

val, err := cowrie.DecodeWithOptions(data, opts)
switch err {
case cowrie.ErrDepthExceeded:
    log.Println("Nested too deep")
case cowrie.ErrArrayTooLarge:
    log.Println("Array too large")
case cowrie.ErrObjectTooLarge:
    log.Println("Object too large")
case cowrie.ErrStringTooLarge:
    log.Println("String too large")
case cowrie.ErrBytesTooLarge:
    log.Println("Bytes/tensor too large")
case cowrie.ErrExtTooLarge:
    log.Println("Extension too large")
case cowrie.ErrDictTooLarge:
    log.Println("Dictionary too large")
case cowrie.ErrTooManyHints:
    log.Println("Too many column hints")
case cowrie.ErrMalformedLength:
    log.Println("Length exceeds remaining data (malicious)")
default:
    log.Println("Other error:", err)
}

Recommended Configurations

Public API (Untrusted Input)

opts := cowrie.DecodeOptions{
    MaxDepth:     100,              // Shallow documents
    MaxArrayLen:  1_000_000,        // 1M elements max
    MaxObjectLen: 10_000,           // 10K fields max
    MaxStringLen: 10_000_000,       // 10MB strings
    MaxBytesLen:  100_000_000,      // 100MB binary
    MaxExtLen:    10_000_000,       // 10MB extensions
    MaxDictLen:   10_000,           // 10K keys
    MaxHintCount: 100,              // 100 column hints
    MaxRank:      8,                // 8D tensors max
    OnUnknownExt: cowrie.UnknownExtError,  // Reject unknown extensions
}

Profile: Conservative, protects against abuse, suitable for user-facing APIs.

Internal Service (Semi-Trusted)

opts := cowrie.DecodeOptions{
    MaxDepth:     500,
    MaxArrayLen:  10_000_000,
    MaxObjectLen: 100_000,
    MaxStringLen: 100_000_000,
    MaxBytesLen:  500_000_000,
    MaxExtLen:    50_000_000,
    MaxDictLen:   100_000,
    MaxHintCount: 1_000,
    MaxRank:      16,
}

Profile: Moderate, allows larger payloads, suitable for service-to-service communication.

ML Workload (Trusted)

opts := cowrie.DefaultDecodeOptions()  // Use generous defaults
// or
opts := cowrie.DecodeOptions{
    MaxDepth:     1000,
    MaxArrayLen:  100_000_000,
    MaxObjectLen: 10_000_000,
    MaxStringLen: 500_000_000,
    MaxBytesLen:  2_000_000_000,     // 2GB for large tensors
    MaxExtLen:    100_000_000,
    MaxDictLen:   10_000_000,
    MaxHintCount: 10_000,
    MaxRank:      32,
}

Profile: Permissive, supports large ML payloads, suitable for trusted data pipelines.

Strict Mode (Maximum Security)

opts := cowrie.DecodeOptions{
    MaxDepth:     50,
    MaxArrayLen:  10_000,
    MaxObjectLen: 1_000,
    MaxStringLen: 1_000_000,         // 1MB
    MaxBytesLen:  10_000_000,        // 10MB
    MaxExtLen:    1_000_000,         // 1MB
    MaxDictLen:   1_000,
    MaxHintCount: 50,
    MaxRank:      4,
    OnUnknownExt: cowrie.UnknownExtError,
}

Profile: Paranoid, rejects anything unusual, suitable for high-security environments.

Performance Impact

Limit checks have negligible overhead (less than 1% CPU) because they use fail-fast checks:

// Fast: Single comparison
if count > MaxArrayLen {
    return ErrArrayTooLarge
}

// No allocation until after limit check
items := make([]*Value, count)  // Only if count <= MaxArrayLen

Benchmark (100KB payload):

No limits: 1.2ms decode
With limits: 1.21ms decode (~1% overhead)

Limits save time by rejecting malicious payloads early.

Monitoring

Track limit violations to detect attacks:

func DecodeWithMetrics(data []byte) (*cowrie.Value, error) {
    val, err := cowrie.Decode(data)
    
    switch err {
    case cowrie.ErrArrayTooLarge, 
         cowrie.ErrObjectTooLarge,
         cowrie.ErrStringTooLarge,
         cowrie.ErrBytesTooLarge,
         cowrie.ErrDictTooLarge:
        metrics.Increment("cowrie.limit_exceeded", map[string]string{
            "error": err.Error(),
        })
        log.Warn("Limit exceeded", "error", err, "size", len(data))
    }
    
    return val, err
}

Best Practices

Use Defaults for ML: Default limits support real ML workloads
Tighten for APIs: Reduce limits for user-facing endpoints
Monitor Violations: Track ErrXxxTooLarge errors
Reject Unknown Extensions: Set OnUnknownExt to Error for strict mode
Combine with Rate Limiting: Limit violations may indicate attack
Test Edge Cases: Verify your limits with real data

Unknown Extension Behavior

Control how the decoder handles unknown TagExt extensions:

type UnknownExtBehavior int

const (
    UnknownExtKeep       UnknownExtBehavior = iota  // Preserve (default)
    UnknownExtSkipAsNull                            // Skip, return null
    UnknownExtError                                 // Error (strict mode)
)

opts := cowrie.DecodeOptions{
    OnUnknownExt: cowrie.UnknownExtError,  // Reject unknown data
}

Use cases:

Keep: Forward compatibility, round-trip preservation
Skip: Ignore unknown extensions silently
Error: Strict validation, reject unknown data

Compression - Decompression bomb protection
Streaming - Limits for streamed frames
Graph Types - Limits for node/edge batches
ML Types - Limits for tensors and images

Getting Started

Core Concepts

Language SDKs

Advanced Features

CLI Tool

Performance

Documentation Index

​Overview

​Two-Layer Protection

​Default Limits

​DecodeOptions

​Zero Values Use Defaults

​Unlimited (Not Recommended)

​Limit Descriptions

​MaxDepth

​MaxArrayLen

​MaxObjectLen

​MaxStringLen

​MaxBytesLen

​MaxExtLen

​MaxDictLen

​MaxHintCount

​MaxRank

​Attack Scenarios

​1. Nested Object Bomb

​2. Array Length Bomb

​3. Dictionary Explosion

​4. Decompression Bomb

​5. Tensor Rank Bomb

​6. Column Hints CPU Spin

​Error Handling

​Recommended Configurations

​Public API (Untrusted Input)

​Internal Service (Semi-Trusted)

​ML Workload (Trusted)

​Strict Mode (Maximum Security)

​Performance Impact

​Monitoring

​Best Practices

​Unknown Extension Behavior

​Related Topics

Build docs developers (and LLMs) love

Overview

Two-Layer Protection

Default Limits

DecodeOptions

Zero Values Use Defaults

Unlimited (Not Recommended)

Limit Descriptions

MaxDepth

MaxArrayLen

MaxObjectLen

MaxStringLen

MaxBytesLen

MaxExtLen

MaxDictLen

MaxHintCount

MaxRank

Attack Scenarios

1. Nested Object Bomb

2. Array Length Bomb

3. Dictionary Explosion

4. Decompression Bomb

5. Tensor Rank Bomb

6. Column Hints CPU Spin

Error Handling

Recommended Configurations

Public API (Untrusted Input)

Internal Service (Semi-Trusted)

ML Workload (Trusted)

Strict Mode (Maximum Security)

Performance Impact

Monitoring

Best Practices

Unknown Extension Behavior

Related Topics