Method ID
ID:04 02 02 (hex)
The BZip2 method is identified by the 3-byte sequence 04 02 02 in 7z archive format.
Note: ZIP archives use a different ID (
04 01 0C) for BZip2, but this is not used in 7z. Always use 04 02 02 for 7z archives.Overview
BZip2 is a widely-used compression algorithm that:- Uses block-sorting (Burrows-Wheeler Transform)
- Applies Move-to-Front transform
- Compresses with Huffman coding
- Provides consistent, predictable compression ratios
- Has moderate memory requirements
How BZip2 Works
How BZip2 Works
The BZip2 compression pipeline:
- Block Division: Input divided into blocks (100-900 KB)
- Burrows-Wheeler Transform: Sorts block rotations to group similar characters
- Move-to-Front: Converts repeated characters to small numbers
- Run-Length Encoding: Compresses runs of zeros
- Huffman Coding: Final entropy encoding
Algorithm Characteristics
Burrows-Wheeler Transform
Reversible transformation that groups similar bytes together for better compression
Block-Based Processing
Processes data in independent blocks, enabling parallel decompression
Move-to-Front
Exploits locality by moving recently seen characters to front of alphabet
Huffman Coding
Standard entropy coding for final compression stage
Compression Parameters
Block Size
Block Size
Range: 100 KB to 900 KB
Values: 1-9 (corresponds to 100KB, 200KB, …, 900KB)
Default: 9 (900 KB)Block size determines:
Values: 1-9 (corresponds to 100KB, 200KB, …, 900KB)
Default: 9 (900 KB)Block size determines:
- Compression ratio: Larger blocks = better compression
- Memory usage: Larger blocks = more memory
- Parallelization: Each block can be processed independently
Compression Level
Compression Level
BZip2 uses block size as its primary level parameter:
| Level | Block Size | Memory (Compress) | Memory (Decompress) |
|---|---|---|---|
| 1 | 100 KB | ~1.2 MB | ~400 KB |
| 2 | 200 KB | ~2.3 MB | ~800 KB |
| 3 | 300 KB | ~3.5 MB | ~1.2 MB |
| 4 | 400 KB | ~4.6 MB | ~1.6 MB |
| 5 | 500 KB | ~5.8 MB | ~2.0 MB |
| 6 | 600 KB | ~6.9 MB | ~2.4 MB |
| 7 | 700 KB | ~8.1 MB | ~2.8 MB |
| 8 | 800 KB | ~9.2 MB | ~3.2 MB |
| 9 | 900 KB | ~10.4 MB | ~3.6 MB |
Unlike LZMA, BZip2 compression and decompression both have modest memory requirements.
Memory Requirements
Encoding
BZip2 compression typically requires:
- Level 1 (100 KB): ~1.2 MB
- Level 5 (500 KB): ~5.8 MB
- Level 9 (900 KB): ~10.4 MB
Decoding
Performance Characteristics
Typical performance on modern hardware (Intel Core i7, 3.5 GHz):Compression:
- Level 1 (100 KB): ~8-10 MB/s
- Level 5 (500 KB): ~6-8 MB/s
- Level 9 (900 KB): ~5-7 MB/s
- All levels: ~15-25 MB/s
- Text files: 20-30% of original size
- Executable files: 40-60% of original size
- Multimedia files: 85-100% of original size (poor)
Compression Ratio Comparison
Text Files
Text Files
| Method | Compression Ratio | Speed (Compress) |
|---|---|---|
| PPMd | 100% (best) | Very Slow (1-2 MB/s) |
| LZMA2 | 105-110% | Slow (2-3 MB/s) |
| BZip2 | 115-125% | Medium (5-7 MB/s) |
| Deflate | 140-160% | Fast (15-25 MB/s) |
Binary Executables
Binary Executables
| Method | Compression Ratio | Speed (Compress) |
|---|---|---|
| LZMA2 + BCJ | 100% (best) | Slow (2-3 MB/s) |
| BZip2 | 120-140% | Medium (5-7 MB/s) |
| Deflate | 150-180% | Fast (15-25 MB/s) |
Already Compressed Data
Already Compressed Data
| Method | Result |
|---|---|
| All methods | No compression or expansion |
Command Line Usage
BZip2 in 7z archives is single-threaded. For parallel compression, use LZMA2 with
-mmt option.BZip2 File Format
Standalone .bz2 files have this structure:In 7z archives, BZip2 streams are wrapped in the 7z format and don’t include the standalone file headers.
BZip2 vs Other Methods
| Feature | BZip2 | LZMA2 | Deflate | PPMd |
|---|---|---|---|---|
| Method ID | 04 02 02 | 21 | 04 01 08 | 03 04 01 |
| Compression Ratio | Good (7/10) | Excellent (9/10) | Fair (5/10) | Excellent (10/10) |
| Compression Speed | Medium (5/10) | Slow (3/10) | Fast (7/10) | Very Slow (2/10) |
| Decompression Speed | Medium (6/10) | Medium (6/10) | Very Fast (9/10) | Slow (4/10) |
| Memory (Compress) | Low (~10 MB) | High (~200 MB) | Very Low (~1 MB) | High (~64 MB) |
| Memory (Decompress) | Low (~4 MB) | Medium (~50 MB) | Very Low (~32 KB) | High (~64 MB) |
| Multithreading | No | Yes | No | No |
| Block-based | Yes | Yes | No | No |
| Streaming | Yes | Yes | Yes | Yes |
| Compatibility | Wide | 7z/XZ | Universal | Limited |
Use Cases
When to Use BZip2
When to Use BZip2
BZip2 is ideal for:
- Memory-constrained systems: Uses much less RAM than LZMA
- Compatibility: Widely supported across platforms
- Balanced performance: Good ratio without excessive time
- Log files: Excellent for text-based logs
- Source code archives: Good compression with reasonable speed
- Incremental backups: Predictable compression ratios
When NOT to Use BZip2
When NOT to Use BZip2
Avoid BZip2 for:
- Maximum compression needed: Use LZMA2 or PPMd instead
- Need for speed: Use Deflate for faster compression/decompression
- Large files with multithreading: Use LZMA2 with
-mmt - Already compressed data: Will not compress further
- Binary executables: LZMA2 with BCJ filter is much better
BZip2 vs LZMA2 Decision Tree
BZip2 vs LZMA2 Decision Tree
Choose BZip2 if:
- Memory < 50 MB available
- Need wide compatibility
- Prefer faster compression than LZMA
- Memory > 100 MB available
- Want maximum compression
- Can use multithreading
- Primarily use 7-Zip/p7zip tools
- Need ZIP compatibility
- Speed is critical
- Very low memory required
- Compressing text/structured data
- Want absolute maximum compression
- Speed is not important
Interoperability
BZip2 compressed data in 7z archives uses the standard BZip2 algorithm and is compatible with:
- Command-line tools: bzip2, bunzip2, pbzip2
- Libraries: libbz2, Apache Commons Compress
- Languages: Python (bz2 module), Java, C/C++, Rust
- Archive formats: 7z, tar.bz2, .bz2
Parallel BZip2 Variants
7-Zip’s BZip2 implementation is single-threaded, but external tools provide parallel variants:| Tool | Threads | Speed Improvement |
|---|---|---|
| bzip2 | 1 | Baseline |
| pbzip2 | Multiple | 3-4x on 4 cores |
| lbzip2 | Multiple | 3-4x on 4 cores |
Best Practices
Block Size Selection
Block Size Selection
General recommendations:
- Small files (< 100 KB): Use 100-300 KB blocks
- Medium files (100 KB - 10 MB): Use 500-700 KB blocks
- Large files (> 10 MB): Use 900 KB blocks (default)
- Very limited memory: Use 100-200 KB blocks
Testing Compression Ratio
Testing Compression Ratio
Compare BZip2 with other methods:
Solid vs Non-Solid Archives
Solid vs Non-Solid Archives
BZip2 can be used in both solid and non-solid archives:
For BZip2, the difference between solid and non-solid is less dramatic than with LZMA, since BZip2 already uses block-based compression.
Error Handling
BZip2 is robust and provides good error detection:- CRC-32 checksums: Each block has a CRC for integrity
- Magic numbers: Easy detection of format errors
- Independent blocks: Corruption limited to single block
See Also
- LZMA2 Compression - Better compression ratio
- Deflate Compression - Faster compression
- Compression Methods Overview - Compare all methods
- External: BZip2 homepage, libbz2 documentation
- Tools: bzip2, pbzip2, lbzip2