Skip to main content
TAR (Tape Archive) is a Unix standard archive format that stores multiple files and directories while preserving file metadata and permissions.

Format Overview

TAR provides:
  • No compression - Pure archiving format
  • Unix metadata preservation - Permissions, ownership, timestamps
  • Unlimited size - No file size restrictions
  • Symbolic link support - Full Unix file system support
  • Device file support - Character and block devices
  • Sparse file support - Efficient sparse file handling
TAR itself does not provide compression. It’s typically combined with GZIP (.tar.gz), BZIP2 (.tar.bz2), or XZ (.tar.xz) for compressed archives.

Format Structure

From source/CPP/7zip/Archive/Tar/TarHeader.h:11-17, TAR uses fixed-size 512-byte records:
namespace NFileHeader {
  const unsigned kRecordSize = 512;
  const unsigned kNameSize = 100;
  const unsigned kUserNameSize = 32;
  const unsigned kGroupNameSize = 32;
  const unsigned kPrefixSize = 155;
}
Each file entry consists of:
  1. Header - 512 bytes containing metadata
  2. Data - File content, padded to 512-byte blocks

Header Structure

Offset  Size  Field
0       100   File name
100     8     File mode (permissions)
108     8     Owner user ID
116     8     Owner group ID
124     12    File size in bytes (octal)
136     12    Modification time (Unix timestamp, octal)
148     8     Checksum
156     1     Type flag
157     100   Link name
257     6     Magic ("ustar")
263     2     Version
265     32    Owner user name
297     32    Owner group name
329     8     Device major number
337     8     Device minor number
345     155   Filename prefix

File Types

TAR supports various file types (TarHeader.h:47-73):
namespace NLinkFlag {
  const char kNormal       = '0';  // Normal file
  const char kHardLink     = '1';  // Hard link
  const char kSymLink      = '2';  // Symbolic link
  const char kCharacter    = '3';  // Character device
  const char kBlock        = '4';  // Block device
  const char kDirectory    = '5';  // Directory
  const char kFIFO         = '6';  // FIFO (named pipe)
  const char kContiguous   = '7';  // Contiguous file
}

Extended Types

GNU TAR and PAX formats add:
  • kGnu_LongName (‘L’) - Long filename (>100 chars)
  • kGnu_LongLink (‘K’) - Long link name
  • kPax (‘x’) - POSIX.1-2001 extended header
  • kGlobal (‘g’) - Global extended header

TAR Variants

USTAR Format

The POSIX standard TAR format (TarHeader.h:83-84):
namespace NMagic {
  extern const char k_Posix_ustar_00[8];  // "ustar\0" + version "00"
  extern const char k_GNU_ustar[8];        // GNU TAR variant
}

PAX Format

Modern POSIX.1-2001 format supporting:
  • Unlimited filename length
  • Unlimited file size
  • Extended attributes
  • High-precision timestamps

Usage Examples

Create TAR Archive

7z a -ttar archive.tar files/
Creates uncompressed TAR archive.

Create TAR.GZ (GZIP Compressed)

7z a -ttar archive.tar files/
7z a -tgzip archive.tar.gz archive.tar
Or in one step:
tar -czf archive.tar.gz files/  # Using system tar

Create TAR.BZ2 (BZIP2 Compressed)

7z a -ttar archive.tar files/
7z a -tbzip2 archive.tar.bz2 archive.tar

Create TAR.XZ (XZ Compressed)

7z a -ttar archive.tar files/
7z a -txz archive.tar.xz archive.tar
XZ compression typically provides the best compression ratio for TAR archives, while GZIP is fastest. BZIP2 offers a middle ground.

Extract TAR Archive

7z x archive.tar
Extracts all files preserving directory structure.

Extract TAR.GZ Archive

7z x archive.tar.gz -so | 7z x -si -ttar
Or simply:
7z x archive.tar.gz
7z x archive.tar

List TAR Contents

7z l archive.tar
Shows file list with permissions and sizes.

Preserving Unix Metadata

File Permissions

TAR stores full Unix permissions (rwxrwxrwx):
7z a -ttar archive.tar files/
On Windows, 7-Zip maps Windows attributes to Unix permissions:
  • Read-only → r--
  • Normal → rw-
  • Executable → rwx
TAR preserves symbolic links:
7z a -ttar -snl archive.tar files/
The -snl option ensures symbolic links are stored as links, not as file copies.

Timestamps

TAR stores modification time with second precision. Extended formats (PAX) support nanosecond precision.

Advanced Features

Long Filenames

Standard TAR limits filenames to 100 characters. GNU TAR and PAX support unlimited lengths:
# Automatically handles long filenames
7z a -ttar archive.tar very/long/path/to/file/with/name/longer/than/100/characters.txt

Sparse Files

GNU TAR efficiently handles sparse files:
const char kSparse = 'S';  // Sparse file flag
Sparse files with large zero regions are stored efficiently.

Incremental Backups

GNU TAR supports incremental backups:
const char kDumpDir = 'D';  // GNU dump directory
TAR stores hard links efficiently:
7z a -ttar archive.tar files/
Hard links reference the same data block, saving space.

Compression Comparison

Same Dataset (1 GB source files)

FormatArchive SizeCompression TimeExtraction Time
.tar1000 MB10s8s
.tar.gz350 MB60s20s
.tar.bz2320 MB120s40s
.tar.xz280 MB180s30s
Compression ratios and times vary significantly based on data type and system performance.

Implementation Details

TAR handler implementation:
source/CPP/7zip/Archive/Tar/
├── TarHandler.cpp      # Archive reading
├── TarHandlerOut.cpp   # Archive creation
├── TarIn.cpp          # TAR parsing
├── TarOut.cpp         # TAR writing
├── TarHeader.h        # Format constants
├── TarItem.h          # Item structures
└── TarUpdate.cpp      # Archive updating

Best Practices

For Linux/Unix Systems

Use TAR for maximum compatibility:
7z a -ttar backup.tar /home/user/
7z a -txz backup.tar.xz backup.tar

For Cross-Platform

TAR.GZ offers best compatibility:
7z a -ttar archive.tar files/
7z a -tgzip archive.tar.gz archive.tar

For Large Archives

Use XZ compression for best ratio:
7z a -ttar data.tar large_dataset/
7z a -txz -mx=9 data.tar.xz data.tar

For Fast Backup

Use GZIP for speed:
7z a -ttar backup.tar files/
7z a -tgzip -mx=1 backup.tar.gz backup.tar

Common Patterns

Backup with Timestamp

BACKUP_DATE=$(date +%Y%m%d)
7z a -ttar backup-$BACKUP_DATE.tar /data/
7z a -txz backup-$BACKUP_DATE.tar.xz backup-$BACKUP_DATE.tar
rm backup-$BACKUP_DATE.tar

Extract Specific Files

7z x archive.tar -i!specific/file.txt

Exclude Patterns

7z a -ttar archive.tar /source/ -x!*.tmp -x!*.log -x!.git

Limitations

TAR format limitations:
  • No built-in compression (must use external compression)
  • No encryption support
  • No error recovery or redundancy
  • Limited metadata on non-Unix systems
  • Standard TAR limited to 8 GB files (fixed in PAX/GNU)

Compatibility Notes

Windows

7-Zip handles TAR on Windows but:
  • File permissions are approximated
  • Symbolic links require admin privileges or Developer Mode
  • Device files are ignored

Linux/macOS

Full TAR support including:
  • All file types
  • Complete permission preservation
  • Extended attributes (with PAX format)

Archive Tools

TAR compatibility:
  • 7-Zip - Full support for GNU TAR and PAX
  • GNU tar - Reference implementation
  • BSD tar - Full POSIX TAR and PAX support
  • Windows tar - Basic support (Windows 10+)

See Also

Build docs developers (and LLMs) love