Documentation Index
Fetch the complete documentation index at: https://mintlify.com/newren/git-filter-repo/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Advanced filtering uses Python callbacks to give you complete control over the filtering process. This enables complex operations that can’t be achieved with simple command-line options.
Understanding Callbacks
Callbacks are Python functions that filter-repo calls for each git object. You provide the function body as a string.
Basic Callback Structure
For a callback like --name-callback, filter-repo creates:
def name_callback(name):
YOUR_CODE_HERE
return modified_name
You only provide the YOUR_CODE_HERE part.
Bytestrings Requiredgit-filter-repo uses bytestrings (bytes), not strings:
- Use
b"text" instead of "text"
- Compare with
b"value" not "value"
- Use
.replace(b"old", b"new")
Simple Callbacks
Name Callback
Modify author, committer, and tagger names:
git filter-repo --name-callback '
return name.replace(b"Wiliam", b"William")
'
Email Callback
Fix email addresses:
git filter-repo --email-callback '
# Fix common typos
email = email.replace(b".cm", b".com")
email = email.replace(b"gmial.com", b"gmail.com")
return email
'
Refname Callback
Modify branch and tag names:
git filter-repo --refname-callback '
# Add prefix to all branches (refs/heads/main -> refs/heads/v2-main)
if refname.startswith(b"refs/heads/"):
branch = refname[11:] # Remove "refs/heads/"
return b"refs/heads/v2-" + branch
return refname
'
Refnames must be fully qualified:
- Use
b"refs/heads/main" not b"main"
- Use
b"refs/tags/v1.0" not b"v1.0"
Filename Callback
Rename or remove files:
git filter-repo --filename-callback '
# Remove all files in src/ subdirectories (except toplevel src/)
if b"/src/" in filename:
return None # Delete file
# Rename tools/ -> scripts/misc/
if filename.startswith(b"tools/"):
return b"scripts/misc/" + filename[6:]
# Keep all other files unchanged
return filename
'
Return values:
filename - Keep file unchanged
- Modified filename - Rename file
None - Remove file from history
Message Callback
Modify commit and tag messages:
git filter-repo --message-callback '
# Add Signed-off-by if missing
if b"Signed-off-by:" not in message:
message += b"\nSigned-off-by: Me Myself <me@example.com>"
# Fix typos
message = re.sub(b"[Ee]-?[Mm][Aa][Ii][Ll]", b"email", message)
return message
'
Object Callbacks
More powerful callbacks that operate on complete git objects.
Blob Callback
Modify file contents:
git filter-repo --blob-callback '
# Skip blobs over 25 bytes
if len(blob.data) > 25:
blob.skip()
else:
blob.data = blob.data.replace(b"Hello", b"Goodbye")
'
Blob properties:
blob.data - File contents (bytes)
blob.original_id - Original git hash
blob.id - New git object ID
blob.skip() - Remove this blob
Commit Callback
Modify commits:
git filter-repo --commit-callback '
# Remove executable files with "666" in their name
commit.file_changes = [
change for change in commit.file_changes
if not (change.mode == b"100755" and b"666" in change.filename)
]
# Prevent deletion of specific file
commit.file_changes = [
change for change in commit.file_changes
if not (change.type == b"D" and change.filename == b"important.txt")
]
# Make all .sh files executable
for change in commit.file_changes:
if change.filename.endswith(b".sh"):
change.mode = b"100755"
'
Commit properties:
commit.branch - Branch name (bytes)
commit.original_id - Original commit hash
commit.author_name, commit.author_email, commit.author_date
commit.committer_name, commit.committer_email, commit.committer_date
commit.message - Commit message (bytes)
commit.parents - List of parent commit IDs
commit.file_changes - List of FileChange objects
commit.skip(new_id) - Skip this commit
FileChange properties:
change.type - b"M" (modify), b"D" (delete), b"DELETEALL"
change.filename - Path (bytes)
change.mode - File mode: b"100644", b"100755", b"120000", b"160000"
change.blob_id - Git blob ID
Tag Callback
Modify annotated tags:
git filter-repo --tag-callback '
# Skip tags by specific author
if tag.tagger_name == b"Jim Williams":
tag.skip()
else:
# Add extra info to tag message
tag.message += b"\n\nTag of %s by %s on %s" % (
tag.ref, tag.tagger_email, tag.tagger_date
)
'
Tag properties:
tag.ref - Tag name (without refs/tags/ prefix)
tag.from_ref - Commit being tagged
tag.original_id - Original tag hash
tag.tagger_name, tag.tagger_email, tag.tagger_date
tag.message - Tag message
tag.skip() - Remove this tag
Reset Callback
Modify reset (branch creation) events:
git filter-repo --reset-callback '
# Rename master branch to main
reset.ref = reset.ref.replace(b"master", b"main")
'
Reset properties:
reset.ref - Reference name
reset.from_ref - Commit hash or mark
Advanced Use Cases
Multi-Line Callbacks
Use multi-line Python code:
git filter-repo --filename-callback '
# Define a mapping
renames = {
b"README": b"README.md",
b"COPYING": b"LICENSE",
b"AUTHORS": b"CONTRIBUTORS.md",
}
# Apply renames
if filename in renames:
return renames[filename]
# Remove backup files
if filename.endswith(b".bak") or filename.endswith(b"~"):
return None
return filename
'
Using Regular Expressions
The re module is available:
git filter-repo --message-callback '
# Convert issue references: #123 -> JIRA-123
message = re.sub(b"#(\\d+)", b"JIRA-\\1", message)
# Remove trailing whitespace from each line
lines = message.split(b"\\n")
lines = [re.sub(b"\\s+$", b"", line) for line in lines]
message = b"\\n".join(lines)
return message
'
Commit callback receives additional metadata:
git filter-repo --commit-callback '
# aux_info contains:
# - orig_parents: original parent commit IDs
# - had_file_changes: whether commit had file changes
# Example: Mark commits that lost all files
if not commit.file_changes and aux_info["had_file_changes"]:
commit.message += b"\n\n[Note: All file changes filtered out]"
'
Conditional Processing
git filter-repo --blob-callback '
# Only process small text files
if len(blob.data) > 1024 * 1024: # > 1MB
return
if b"\\0" in blob.data[0:8192]: # Binary file
return
# Safe to process as text
blob.data = blob.data.upper()
'
Combining Callbacks
Use multiple callbacks together:
git filter-repo \
--name-callback 'return name.title()' \
--email-callback 'return email.lower()' \
--filename-callback '
if filename.endswith(b".tmp"):
return None
return filename
' \
--message-callback '
return message.replace(b"TODO", b"DONE")
'
Complex Examples
Enforce File Naming Convention
git filter-repo --filename-callback '
# Convert to lowercase
parts = filename.split(b"/")
parts[-1] = parts[-1].lower()
filename = b"/".join(parts)
# Replace spaces with hyphens
filename = filename.replace(b" ", b"-")
# Remove special characters
filename = re.sub(b"[^a-z0-9/_.-]", b"", filename)
return filename
'
git filter-repo --blob-callback '
# Skip binary files
if b"\\0" in blob.data[0:8192]:
return
# Add copyright header to source files
header = b"""# Copyright (C) 2024 Example Corp
# Licensed under MIT License
"""
if not blob.data.startswith(b"# Copyright"):
blob.data = header + blob.data
'
Squash Small Commits
This requires more complex logic:
git filter-repo --commit-callback '
# Skip commits with tiny messages
if len(commit.message) < 10:
commit.skip(commit.first_parent())
'
commit.skip(new_id) marks the commit as skipped and maps its ID to new_id. Children of this commit will use new_id as their parent.
Rewrite Dates
git filter-repo --commit-callback '
# Make all commits appear to be from 2024
import time
from datetime import datetime
# Parse existing date
timestamp, timezone = commit.author_date.split()
dt = datetime.fromtimestamp(int(timestamp))
# Update year
new_dt = dt.replace(year=2024)
new_timestamp = int(new_dt.timestamp())
# Update both author and committer dates
commit.author_date = b"%d %s" % (new_timestamp, timezone)
commit.committer_date = commit.author_date
'
Remove Merge Commits
git filter-repo --commit-callback '
# Skip merge commits (commits with multiple parents)
if len(commit.parents) > 1:
commit.skip(commit.first_parent())
'
Using External Scripts
For very complex logic, use external Python scripts:
git filter-repo --commit-callback "$(cat my_callback.py)"
my_callback.py:
import json
# Load configuration
with open('filter-config.json', 'rb') as f:
config = json.load(f)
# Complex filtering logic
if commit.branch in config['protected_branches']:
return
# ... more logic ...
Optimize Callbacks
- Avoid expensive operations in hot paths
- Cache results when possible
- Short-circuit early if possible
- Use bytestring operations (faster than string)
# Good: Short-circuit early
if not filename.endswith(b".py"):
return filename
# ... expensive processing ...
# Bad: Always processes
# ... expensive processing ...
if filename.endswith(b".py"):
return modified_filename
return filename
Callback ErrorsIf a callback raises an exception, filter-repo will abort. Test thoroughly:# Test on a small branch first
git filter-repo --refs test-branch --callback '...'
Available Modules
These Python modules are available in callbacks:
argparse - Argument parsing
collections - Container datatypes
fnmatch - Filename pattern matching
io - I/O operations
os - Operating system interface
platform - Platform identification
re - Regular expressions
shutil - High-level file operations
subprocess - Subprocess management
sys - System-specific parameters
time - Time access
textwrap - Text wrapping
datetime - Date/time handling
Plus all filter-repo classes:
Blob, Commit, Tag, Reset, FileChange
FilteringOptions, RepoFilter
API Compatibility Warning
API May ChangeThe callback API is NOT guaranteed to be stable. If you write scripts that use callbacks:
- Pin to a specific git-filter-repo version
- Test after any upgrades
- Contribute test cases for APIs you rely on
See Library Usage for more stable APIs.
Next Steps