Skip to main content
PPMd (Prediction by Partial Matching, variant D) is a compression method based on context modeling and statistical prediction. It provides excellent compression ratios for text and other highly structured data.

Method ID

ID: 03 04 01 (hex) The PPMd method is identified by the 3-byte sequence 03 04 01 in 7z archive format.

Overview

7-Zip includes two PPMd implementations:

PPMd Variant H (PPMd7)

Original PPMd algorithm from 2001
  • Based on Dmitry Shkarin’s PPMd var.H
  • Used in 7z archives
  • Order: 2-64

PPMd Variant I (PPMd8)

Improved PPMd algorithm from 2002
  • Based on Dmitry Shkarin’s PPMd var.I
  • Better compression ratio
  • Order: 2-16
PPMd is particularly effective for:
  • Text files (source code, logs, documents)
  • Structured data (CSV, JSON, XML)
  • Database dumps
  • Configuration files
For these file types, PPMd often achieves 5-20% better compression than LZMA.

How PPMd Works

PPMd uses Prediction by Partial Matching:
  1. Context Modeling: Analyzes previous bytes to build statistical models
  2. Probability Prediction: Predicts the probability of each possible next byte
  3. Arithmetic Coding: Encodes bytes using their predicted probabilities
  4. Adaptive Learning: Updates models as more data is processed
PPMd maintains multiple context models of different lengths (orders):
Order 0: Current byte independent of previous bytes
Order 1: Based on 1 previous byte
Order 2: Based on 2 previous bytes
...
Order N: Based on N previous bytes (max order)
Higher orders provide better compression for structured data but require more memory.
When a byte is not found in the current context, PPMd uses an “escape” mechanism:
// From Ppmd7.c:15
static const Byte PPMD7_kExpEscape[16] = 
  { 25, 14, 9, 7, 5, 5, 4, 4, 4, 3, 3, 3, 2, 2, 2, 2 };
This allows the algorithm to fall back to lower-order contexts when needed.

PPMd7 (Variant H)

PPMd var.H is the original PPMd implementation used in 7z archives.

Parameters

Type: unsigned
Range: PPMD7_MIN_ORDER (2) to PPMD7_MAX_ORDER (64)
Default: 6
// From Ppmd7.h:14-15
#define PPMD7_MIN_ORDER 2
#define PPMD7_MAX_ORDER 64
Maximum context length for prediction.
  • Low order (2-4): Faster, less memory, lower compression
  • Medium order (6-8): Balanced (recommended)
  • High order (16-64): Best compression for structured text
For most text files, order 6-8 provides the best balance. For highly structured data like source code, try order 10-16.
Type: UInt32
Range: PPMD7_MIN_MEM_SIZE (2 KB) to PPMD7_MAX_MEM_SIZE (~4 GB)
Default: 16 MB
// From Ppmd7.h:17-18
#define PPMD7_MIN_MEM_SIZE (1 << 11)
#define PPMD7_MAX_MEM_SIZE (0xFFFFFFFF - 12 * 3)
Amount of memory allocated for context models.Typical settings:
  • 1-4 MB: Fast, lower compression
  • 16-64 MB: Good balance (recommended)
  • 128-256 MB: Maximum compression for large text files
Both encoder and decoder require the same amount of memory. Large memory settings may cause issues on memory-constrained systems.

Memory Requirements

PPMd7 memory usage:
Memory = mem_size + alignment + state_structures
       ≈ mem_size + 4 KB
  • Encoding: Same as memory size parameter
  • Decoding: Same as memory size parameter
  • Stack: Minimal (< 1 KB)
Example configurations:
  • Order 6, 16 MB: ~16 MB RAM (both compress/decompress)
  • Order 8, 64 MB: ~64 MB RAM (both compress/decompress)
  • Order 16, 256 MB: ~256 MB RAM (both compress/decompress)

API Usage

#include "Ppmd7.h"
#include "Ppmd7Enc.h"

// Allocate PPMd7
CPpmd7 ppmd;
Ppmd7_Construct(&ppmd);

UInt32 memSize = 16 << 20;  // 16 MB
unsigned maxOrder = 6;

if (!Ppmd7_Alloc(&ppmd, memSize, &g_Alloc))
  return SZ_ERROR_MEM;

// Initialize for encoding
Ppmd7_Init(&ppmd, maxOrder);
Ppmd7z_Init_RangeEnc(&ppmd);

// Encode data
Ppmd7z_EncodeSymbols(&ppmd, buf, buf + size);

// Flush encoder
Ppmd7z_Flush_RangeEnc(&ppmd);

// Free memory
Ppmd7_Free(&ppmd, &g_Alloc);

Decoding

#include "Ppmd7.h"
#include "Ppmd7Dec.h"

// Allocate and initialize decoder
CPpmd7 ppmd;
Ppmd7_Construct(&ppmd);
Ppmd7_Alloc(&ppmd, memSize, &g_Alloc);
Ppmd7_Init(&ppmd, maxOrder);
Ppmd7z_RangeDec_Init(&ppmd.rc.dec);

// Decode symbols
for (;;) {
  int sym = Ppmd7z_DecodeSymbol(&ppmd);
  if (sym < 0)
    break;  // PPMD7_SYM_END or PPMD7_SYM_ERROR
  *dest++ = (Byte)sym;
}

Ppmd7_Free(&ppmd, &g_Alloc);

PPMd8 (Variant I)

PPMd var.I is an improved version with better compression ratio.

Key Differences from PPMd7

PPMd8 has a lower maximum order:
// From Ppmd8.h:14-15
#define PPMD8_MIN_ORDER 2
#define PPMD8_MAX_ORDER 16
This is due to improved context modeling that provides better compression with lower orders.
PPMd8 supports different memory restoration methods:
// From Ppmd8.h:56-64
enum {
  PPMD8_RESTORE_METHOD_RESTART,      // Default
  PPMD8_RESTORE_METHOD_CUT_OFF       // Alternative
  // PPMD8_RESTORE_METHOD_FREEZE     // Disabled (compatibility)
};
  • RESTART: More aggressive memory recycling
  • CUT_OFF: Conservative memory management
FREEZE mode is disabled due to compatibility issues between PPMdI rev.1 and rev.2.
PPMd8 uses different statistical models:
// From Ppmd8.c:15-17
static const Byte PPMD8_kExpEscape[16] = 
  { 25, 14, 9, 7, 5, 5, 4, 4, 4, 3, 3, 3, 2, 2, 2, 2 };
static const UInt16 PPMD8_kInitBinEsc[] = 
  { 0x3CDD, 0x1F3F, 0x59BF, 0x48F3, 0x64A1, 0x5ABC, 0x6632, 0x6051 };
These provide better probability estimation for certain data patterns.

API Usage

#include "Ppmd8.h"
#include "Ppmd8Enc.h"

// Allocate PPMd8
CPpmd8 ppmd;
Ppmd8_Construct(&ppmd);

UInt32 memSize = 16 << 20;  // 16 MB
unsigned maxOrder = 6;
unsigned restoreMethod = PPMD8_RESTORE_METHOD_RESTART;

Ppmd8_Alloc(&ppmd, memSize, &g_Alloc);

// Initialize
Ppmd8_Init(&ppmd, maxOrder, restoreMethod);
Ppmd8_Init_RangeEnc(&ppmd);

// Encode
for (size_t i = 0; i < size; i++)
  Ppmd8_EncodeSymbol(&ppmd, buf[i]);

Ppmd8_Flush_RangeEnc(&ppmd);
Ppmd8_Free(&ppmd, &g_Alloc);

Performance Characteristics

Typical performance on modern hardware (Intel Core i7, 3.5 GHz):Compression:
  • Order 6, 16 MB: ~1-2 MB/s
  • Order 8, 64 MB: ~0.5-1 MB/s
  • Order 16, 256 MB: ~0.3-0.5 MB/s
Decompression:
  • Order 6: ~5-10 MB/s
  • Order 8: ~3-7 MB/s
  • Order 16: ~2-5 MB/s
Compression ratio (vs LZMA):
  • Text files: 5-20% better
  • Source code: 10-25% better
  • Binary executables: Similar or worse
  • Multimedia: Much worse (use LZMA)

Compression Ratio Comparison

MethodCompression RatioSpeed
PPMd (order 8)100% (best)Slow
PPMd (order 6)102-105%Medium
LZMA2110-120%Medium
BZip2115-125%Fast
Deflate130-150%Very Fast
For large text archives, PPMd can save 10-20% space compared to LZMA.
MethodCompression RatioSpeed
PPMd (order 10)100% (best)Very Slow
PPMd (order 6)103-107%Slow
LZMA2115-130%Medium
BZip2120-135%Fast
Deflate140-160%Very Fast
MethodCompression RatioSpeed
LZMA2100% (best)Medium
PPMd105-115%Slow
BZip2110-120%Fast
Deflate120-140%Very Fast
PPMd is not recommended for binary executables. Use LZMA2 with BCJ filter instead.

Command Line Usage

# Compress with PPMd (default: order 6, 16 MB)
7z a -m0=PPMd archive.7z file.txt

# Set order and memory size
7z a -m0=PPMd -mo=8 -mmem=64m archive.7z file.txt

# Maximum compression for text
7z a -m0=PPMd -mo=16 -mmem=256m archive.7z documents/

# Fast PPMd compression
7z a -m0=PPMd -mo=4 -mmem=8m archive.7z file.txt

# Compare with LZMA
7z a -m0=LZMA2 lzma.7z file.txt
7z a -m0=PPMd -mo=8 -mmem=64m ppmd.7z file.txt
Recommended settings by file size:
  • Small files (< 1 MB): -mo=4 -mmem=4m
  • Medium files (1-10 MB): -mo=6 -mmem=16m
  • Large files (10-100 MB): -mo=8 -mmem=64m
  • Huge files (> 100 MB): -mo=10 -mmem=128m

Best Practices

Higher order = better compression but slower and more memory:
  1. Start with order 6 (default)
  2. For highly structured data, try order 8-10
  3. For maximum compression, test up to order 16
  4. Monitor compression time and memory usage
# Test different orders
for order in 4 6 8 10 12; do
  7z a -m0=PPMd -mo=$order -mmem=64m test-$order.7z file.txt
done
Memory should be:
  • At least 1-2x the file size for small files
  • 10-20% of file size for large files
  • Minimum 4 MB for order 6+
  • Maximum what both encoder and decoder can allocate
Decoder needs the same memory as encoder. Don’t use 256 MB if decoder only has 64 MB available.
Avoid PPMd for:
  • Already compressed data: JPEG, PNG, MP3, video files
  • Binary executables: Use LZMA2 with BCJ filter
  • Random data: Any method will fail
  • Memory-constrained systems: Use LZMA2 or Deflate
  • Need for speed: Use LZMA2 with multithreading
# Wrong: PPMd for multimedia
7z a -m0=PPMd archive.7z photos/*.jpg  # Will be larger!

# Right: Store only or use LZMA2
7z a -m0=Copy archive.7z photos/*.jpg
7z a archive.7z photos/*.jpg  # Auto-detects, uses LZMA2

Error Codes

SymbolValueDescription
PPMD7_SYM_END-1End of payload marker
PPMD7_SYM_ERROR-2Data corruption error
PPMD8_SYM_END-1End of payload marker
PPMD8_SYM_ERROR-2Data corruption error

PPMd7 vs PPMd8

FeaturePPMd7 (var.H)PPMd8 (var.I)
Year20012002
Max Order6416
CompressionExcellentSlightly better
SpeedMediumSlightly slower
MemorySame as settingSame as setting
Restore methodSimpleMultiple options
7z defaultYesNo
CompatibilityWiderLimited
For most use cases, PPMd7 (var.H) is recommended as it’s the standard PPMd implementation in 7z archives.

See Also

Build docs developers (and LLMs) love