Format Overview
BZIP2 provides:- High compression ratio - 10-15% better than GZIP
- Block-based compression - Independent 100-900 KB blocks
- Burrows-Wheeler algorithm - Advanced text transformation
- Error recovery - Block independence aids recovery
- No file size limit - Unlimited file sizes
- Open source - Free implementation (libbzip2)
BZIP2 excels at compressing text and source code, often achieving 10-15% better compression than GZIP while using similar memory.
Format Structure
Fromsource/CPP/7zip/Archive/Bz2Handler.cpp:107-121, BZIP2 files have this structure:
File Header
Block Size
The fourth byte indicates block size:| Byte | Block Size | Memory (Compress) | Memory (Decompress) |
|---|---|---|---|
| ‘1’ | 100 KB | ~1 MB | ~400 KB |
| ’2’ | 200 KB | ~2 MB | ~800 KB |
| ’3’ | 300 KB | ~3 MB | ~1.2 MB |
| … | … | … | … |
| ‘9’ | 900 KB | ~9 MB | ~3.6 MB |
Larger block sizes generally provide better compression but require more memory. The default is usually ‘9’ (900 KB blocks).
Compression Algorithm
BZIP2 uses a multi-stage compression pipeline:- Run-Length Encoding (RLE) - First pass
- Burrows-Wheeler Transform (BWT) - Sorts data for better compression
- Move-To-Front (MTF) - Transform to improve locality
- Run-Length Encoding - Second pass
- Huffman Coding - Final entropy encoding
Usage Examples
Compress Single File
Compress with Specific Block Size
Decompress File
Create TAR.BZ2 Archive
Test Archive Integrity
List Archive Information
Handler Implementation
Fromsource/CPP/7zip/Archive/Bz2Handler.cpp:23-47:
Archive Properties
BZIP2 handler tracks several properties (Bz2Handler.cpp:49-82):
Archive-Level Properties
- kpidPhySize - Compressed size
- kpidUnpackSize - Uncompressed size
- kpidNumStreams - Number of BZIP2 streams
- kpidNumBlocks - Number of compression blocks
- kpidErrorFlags - Error indicators
Item Properties
- kpidPackSize - Compressed size
- kpidSize - Uncompressed size
Block Independence
BZIP2’s block structure enables:Parallel Decompression
Blocks can be decompressed independently:Error Recovery
If one block is corrupted, other blocks remain intact:Seeking
Blocks allow seeking within compressed files without full decompression.Compression Performance
Compression Levels
| Level | Block Size | Ratio | Speed | Memory |
|---|---|---|---|---|
| -mx=1 | 100 KB | Good | Fast | Low |
| -mx=3 | 300 KB | Better | Medium | Medium |
| -mx=5 | 500 KB | Better | Medium | Medium |
| -mx=9 | 900 KB | Best | Slow | High |
Sample Performance (100 MB text file)
| Format | Time | Size | Ratio | Relative Speed |
|---|---|---|---|---|
| GZIP (-mx=9) | 12s | 28 MB | 28% | 1.0x (baseline) |
| BZIP2 (-mx=1) | 15s | 26 MB | 26% | 0.8x |
| BZIP2 (-mx=5) | 18s | 24 MB | 24% | 0.67x |
| BZIP2 (-mx=9) | 22s | 23 MB | 23% | 0.55x |
| XZ (-mx=9) | 35s | 20 MB | 20% | 0.34x |
BZIP2 provides a good balance between GZIP’s speed and XZ’s compression ratio.
Advanced Usage
Compress from Standard Input
Decompress to Standard Output
Multiple Streams
BZIP2 supports concatenated streams:Recovery Archives
Implementation Details
BZIP2 compression codec:Comparison with Other Formats
BZIP2 vs GZIP
Advantages:- 10-15% better compression
- Block structure enables recovery
- Better for text files
- Slower compression (2-3x)
- Slower decompression (2x)
- Higher memory usage
BZIP2 vs XZ
Advantages:- Faster compression and decompression
- Lower memory requirements
- Block independence
- Lower compression ratio (15-20% larger)
- Less efficient for binary data
BZIP2 vs 7z/LZMA
Advantages:- Simpler format
- Better tool support
- Faster decompression
- Lower compression ratio
- Single file only (needs TAR)
- No encryption
Best Practices
For Text Files
Use
-mx=9 for excellent compression of source code and documentsFor Log Archives
Combine with TAR for compressed log archives
For Balance
Use BZIP2 when GZIP is too weak and XZ is too slow
For Recovery
Use block structure for better error recovery
Common Use Cases
Source Code Distribution
Log Compression
Database Dumps
System Backup
Memory Requirements
Compression
-mx=9 (900KB blocks):
Decompression
BZIP2 uses less memory than LZMA/XZ but more than GZIP, making it suitable for resource-constrained environments.
Error Handling
FromBz2Handler.cpp:73-81:
- IsNotArc - Invalid BZIP2 signature
- UnexpectedEnd - Truncated file
- DataAfterEnd - Extra data after stream
Limitations
Compatibility
Universal Support
BZIP2 is widely supported:- Linux/Unix - bzip2, bunzip2 commands
- macOS - Built-in bzip2
- Windows - 7-Zip, WinZip, various tools
- Programming - libbzip2 library in many languages
Archive Tools
- 7-Zip - Full support
- bzip2 - Reference implementation
- pbzip2 - Parallel implementation
- lbzip2 - Multi-threaded implementation
- WinZip - Windows support
Performance Optimization
Parallel Compression
Use parallel implementations:Choosing Block Size
- Small files (less than 1 MB) - Use
-mx=1or-mx=3 - Medium files (1-10 MB) - Use
-mx=5 - Large files (greater than 10 MB) - Use
-mx=9