Overview
The blob module provides types and abstractions for working with blob data:- BlobType: Distinguishes between tree and data blobs
- BlobId: Content-addressed identifier (SHA-256 hash)
- BlobLocation: Position within a pack file
- Tree: Directory structure blob
BlobType Enum
Every blob has one of two types:- Directory structure
- File metadata (permissions, timestamps)
- References to other trees and data blobs
- Chunked file content
- Deduplicated across all snapshots
Blob Identifiers
BlobId
Generic blob identifier (32-byte SHA-256 hash):TreeId and DataId
Type-safe blob identifiers:BlobLocation
Describes where a blob lives within a pack file:Tree Blobs
Trees represent directory structures as a list of nodes:Tree Structure
Node contains:
- Name (filename)
- Type (file, dir, symlink, etc.)
- Metadata (size, permissions, timestamps)
subtreefield (for directories)contentfield (blob IDs for file data)
Creating Trees
Blob Storage
Blobs are stored in pack files for efficiency:- Multiple blobs per pack file
- Same blob type per pack (all trees or all data)
- Index maps blob IDs to pack locations
- Compression applied per-blob (repository v2)
BlobTypeMap
Utility for mapping blob types to values:Working with Blobs
Reading Blobs
Writing Blobs
Blob Deduplication
Blobs are content-addressed by SHA-256 hash:- Compute hash of blob data
- Use hash as blob ID
- Check index for existing blob
- Skip writing if already exists
Performance Considerations
Caching:- Tree blobs are cached (small, frequently accessed)
- Data blobs are not cached (large, accessed once)
- Trees: 4 MiB default (grows with repo size)
- Data: 32 MiB default (grows with repo size)
- Limit pack reads to 40 MiB chunks
- Fill small holes (< 256 KiB) when repacking
See Also
- Repository Files - Pack and index file formats
- Index - Blob indexing and lookup
- Chunker - Content-defined chunking