Documentation Index
Fetch the complete documentation index at: https://mintlify.com/newren/git-filter-repo/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Callbacks are functions you provide to RepoFilter that get called when processing different Git objects. They allow you to inspect and modify repository history programmatically.
Callback Types
git-filter-repo supports several types of callbacks, each called at different points during filtering.
Object Callbacks
Called for each Git object (with full object access):
blob_callback - Called for each blob (file content)
commit_callback - Called for each commit
tag_callback - Called for each annotated tag
reset_callback - Called for each branch reset
Field Callbacks
Called for specific fields (simpler, string-based):
filename_callback - Called for each filename
message_callback - Called for commit/tag messages
name_callback - Called for author/committer/tagger names
email_callback - Called for email addresses
refname_callback - Called for branch/tag names
Special Callbacks
file_info_callback - Advanced callback with access to file contents and metadata
done_callback - Called once when filtering completes
Callback Signatures
blob_callback
def blob_callback(blob: Blob, metadata: dict) -> None:
"""Called for each blob."""
pass
The blob object to process. Modify blob.data to change contents.
Contains commit_rename_func, ancestry_graph, original_ancestry_graph
Example:
def blob_callback(blob, metadata):
# Skip binary files
if b"\0" in blob.data[0:8192]:
return
# Replace text in all text files
blob.data = blob.data.replace(b'TODO', b'DONE')
# Skip large files
if len(blob.data) > 5_000_000:
blob.skip()
commit_callback
def commit_callback(commit: Commit, metadata: dict) -> None:
"""Called for each commit."""
pass
The commit object to process. Modify any attribute.
Includes commit_rename_func, ancestry_graph, original_ancestry_graph, orig_parents, had_file_changes
Example:
def commit_callback(commit, metadata):
# Add sign-off to all commits
author = f"{commit.author_name.decode()} <{commit.author_email.decode()}>"
sign_off = f"\n\nSigned-off-by: {author}".encode()
if sign_off not in commit.message:
commit.message = commit.message.rstrip() + sign_off
# Filter file changes
commit.file_changes = [
c for c in commit.file_changes
if c.filename.startswith(b'src/')
]
# Skip if commit becomes empty
if not commit.file_changes and commit.parents:
commit.skip(commit.first_parent())
tag_callback
def tag_callback(tag: Tag, metadata: dict) -> None:
"""Called for each annotated tag."""
pass
The tag object to process
Example:
def tag_callback(tag, metadata):
# Rename version tags
if tag.ref.startswith(b'v'):
tag.ref = b'version-' + tag.ref[1:]
# Update tagger email
if tag.tagger_email == b'old@example.com':
tag.tagger_email = b'new@example.com'
reset_callback
def reset_callback(reset: Reset, metadata: dict) -> None:
"""Called for each branch reset."""
pass
Example:
def reset_callback(reset, metadata):
# Rename master to main
if reset.ref == b'refs/heads/master':
reset.ref = b'refs/heads/main'
filename_callback
def filename_callback(filename: bytes) -> bytes | None:
"""Called for each filename. Return None to exclude file."""
pass
Returns: Modified filename (bytes) or None to exclude the file
Example:
def filename_callback(filename):
# Exclude build artifacts
if filename.endswith(b'.pyc') or filename.endswith(b'.o'):
return None
# Rename directory
if filename.startswith(b'old_src/'):
return b'src/' + filename[8:]
return filename
message_callback
def message_callback(message: bytes) -> bytes:
"""Called for commit and tag messages."""
pass
Example:
import re
def message_callback(message):
# Remove JIRA ticket references
message = re.sub(br'\[?PROJ-\d+\]?:?\s*', b'', message)
# Normalize line endings
message = message.replace(b'\r\n', b'\n')
return message
name_callback
def name_callback(name: bytes) -> bytes:
"""Called for author, committer, and tagger names."""
pass
Example:
def name_callback(name):
# Normalize name format
return name.replace(b'Jon', b'John')
email_callback
def email_callback(email: bytes) -> bytes:
"""Called for all email addresses."""
pass
Example:
def email_callback(email):
# Update company domain
if email.endswith(b'@oldcompany.com'):
return email.replace(b'@oldcompany.com', b'@newcompany.com')
return email
refname_callback
def refname_callback(refname: bytes) -> bytes:
"""Called for branch and tag references."""
pass
Example:
def refname_callback(refname):
# Add prefix to all branches
if refname.startswith(b'refs/heads/'):
branch = refname[11:] # Remove 'refs/heads/'
return b'refs/heads/team1-' + branch
return refname
file_info_callback
Advanced callback with access to file contents and utilities.
def file_info_callback(
filename: bytes,
mode: bytes,
blob_id: int | bytes,
value: FileInfoValueHelper
) -> tuple[bytes, bytes, int | bytes]:
"""Process file with access to contents."""
pass
File mode (b'100644', b'100755', b'120000', b'160000')
value
FileInfoValueHelper
required
Helper object with utility methods (see below)
Returns: (filename, mode, blob_id) tuple, or (filename, None, None) to delete, or (None, ...) to exclude
FileInfoValueHelper Methods
get_contents_by_identifier(blob_id)
Retrieve blob contents by mark or hash. Returns bytes or None.
get_size_by_identifier(blob_id)
Get blob size without reading contents. Returns int.
insert_file_with_contents(contents)
Create new blob with given contents. Returns new blob_id.
Check if contents appear to be binary. Returns bool.
apply_replace_text(contents)
Apply text replacements from --replace-text. Returns modified bytes.
Custom data storage for passing state between callbacks
Example:
def file_info_callback(filename, mode, blob_id, value):
# Only process Python files
if not filename.endswith(b'.py'):
return (filename, mode, blob_id)
# Get file contents
contents = value.get_contents_by_identifier(blob_id)
if contents is None:
return (filename, mode, blob_id)
# Skip if binary
if value.is_binary(contents):
return (filename, mode, blob_id)
# Format with black (example)
import subprocess
import tempfile
with tempfile.NamedTemporaryFile(suffix='.py', delete=False) as f:
f.write(contents)
temp_path = f.name
try:
subprocess.run(['black', temp_path], check=True)
with open(temp_path, 'rb') as f:
new_contents = f.read()
finally:
os.unlink(temp_path)
# Insert modified blob
if new_contents != contents:
new_blob_id = value.insert_file_with_contents(new_contents)
return (filename, mode, new_blob_id)
return (filename, mode, blob_id)
done_callback
def done_callback() -> None:
"""Called once when filtering completes."""
pass
Example:
stats = {'count': 0}
def commit_callback(commit, metadata):
stats['count'] += 1
def done_callback():
print(f"Processed {stats['count']} commits")
filter = fr.RepoFilter(
args,
commit_callback=commit_callback,
done_callback=done_callback
)
Callbacks receive a metadata dictionary with helpful utilities:
commit_rename_func
Function to translate old commit hashes to new ones.
def commit_callback(commit, metadata):
# Get translation function
translate = metadata['commit_rename_func']
# Translate hash in commit message
if b'cherry-picked from ' in commit.message:
# Extract old hash and translate it
old_hash = extract_hash(commit.message)
new_hash = translate(old_hash)
commit.message = commit.message.replace(old_hash, new_hash)
ancestry_graph
Graph of commit ancestry in the filtered repository.
def commit_callback(commit, metadata):
graph = metadata['ancestry_graph']
# Check ancestry relationships
if commit.parents:
parent_id = commit.parents[0]
# graph has methods like is_ancestor(possible_ancestor, commit)
original_ancestry_graph
Graph of commit ancestry in the original repository.
def commit_callback(commit, metadata):
orig_graph = metadata['original_ancestry_graph']
# Get original parents
orig_parents = metadata['orig_parents']
# Check if was originally a merge
if len(orig_parents) >= 2:
print(f"Commit {commit.original_id} was a merge")
orig_parents
Original parent commits before filtering (commit_callback only).
def commit_callback(commit, metadata):
orig_parents = metadata['orig_parents']
current_parents = commit.parents
if len(orig_parents) != len(current_parents):
print("Parents were pruned")
had_file_changes
Whether commit originally had file changes (commit_callback only).
def commit_callback(commit, metadata):
if metadata['had_file_changes'] and not commit.file_changes:
print(f"Commit {commit.original_id} became empty")
Common Patterns
Lint History
Run a linter on all files in history:
import subprocess
import tempfile
import os
blobs_handled = {}
def commit_callback(commit, metadata):
for change in commit.file_changes:
# Skip if already processed
if change.blob_id in blobs_handled:
change.blob_id = blobs_handled[change.blob_id]
continue
if change.type == b'D':
continue
# Only process Python files
if not change.filename.endswith(b'.py'):
continue
# Get contents via git cat-file
cmd = ['git', 'cat-file', 'blob', change.blob_id]
contents = subprocess.check_output(cmd)
# Write to temp file
with tempfile.NamedTemporaryFile(suffix='.py', delete=False) as f:
f.write(contents)
temp_path = f.name
try:
# Run linter
subprocess.run(['black', temp_path], check=True)
# Read modified contents
with open(temp_path, 'rb') as f:
new_contents = f.read()
# Create new blob
if new_contents != contents:
blob = fr.Blob(new_contents)
filter.insert(blob)
blobs_handled[change.blob_id] = blob.id
change.blob_id = blob.id
finally:
os.unlink(temp_path)
Add File to Beginning
Insert a file into all root commits:
import subprocess
# Hash the file into git's object database
file_hash = subprocess.check_output(
['git', 'hash-object', '-w', 'LICENSE']
).strip()
def commit_callback(commit, metadata):
if len(commit.parents) == 0: # Root commit
commit.file_changes.append(
fr.FileChange(b'M', b'LICENSE', file_hash, b'100644')
)
import re
def message_callback(message):
# Remove all Signed-off-by lines
message = re.sub(
br'^\s*Signed-off-by:.*$',
b'',
message,
flags=re.MULTILINE
)
# Clean up extra blank lines
message = re.sub(br'\n\n+', b'\n\n', message)
return message.strip() + b'\n'
Track Statistics
stats = {
'commits': 0,
'empty_commits_removed': 0,
'blobs_modified': 0,
'total_size_removed': 0
}
def blob_callback(blob, metadata):
original_size = len(blob.data)
# Replace sensitive data
blob.data = blob.data.replace(b'SECRET_KEY', b'***')
if len(blob.data) != original_size:
stats['blobs_modified'] += 1
stats['total_size_removed'] += original_size - len(blob.data)
def commit_callback(commit, metadata):
stats['commits'] += 1
if not commit.file_changes and commit.parents:
stats['empty_commits_removed'] += 1
commit.skip(commit.first_parent())
def done_callback():
print(f"\n=== Statistics ===")
print(f"Commits processed: {stats['commits']}")
print(f"Empty commits removed: {stats['empty_commits_removed']}")
print(f"Blobs modified: {stats['blobs_modified']}")
print(f"Total size removed: {stats['total_size_removed']} bytes")
Combining Multiple Callbacks
You can use multiple callbacks together:
import git_filter_repo as fr
def my_filename_callback(filename):
# Rename directories
if filename.startswith(b'old_name/'):
return b'new_name/' + filename[9:]
return filename
def my_message_callback(message):
# Add prefix to all messages
return b'[Migrated] ' + message
def my_commit_callback(commit, metadata):
# Update author emails
if commit.author_email.endswith(b'@old.com'):
commit.author_email = commit.author_email.replace(
b'@old.com', b'@new.com'
)
def my_done_callback():
print("Filtering complete!")
args = fr.FilteringOptions.parse_args(['--force'])
filter = fr.RepoFilter(
args,
filename_callback=my_filename_callback,
message_callback=my_message_callback,
commit_callback=my_commit_callback,
done_callback=my_done_callback
)
filter.run()
Best Practices
-
Start Simple: Begin with field callbacks (filename, message, name, email) before moving to object callbacks
-
Test on Small Repos: Test your callbacks on a small test repository first
-
Handle Encoding: All strings in git-filter-repo are bytes, not str
-
Be Careful with skip(): Skipping commits changes their children’s parents
-
Use file_info_callback for Content: When you need both filename and contents, use
file_info_callback instead of blob_callback
-
Track State: Use module-level or closure variables to track state across callbacks
-
Check for None: File operations can return None (e.g., when blobs are stripped)
-
Preserve Metadata: Don’t forget to update commit messages, dates, etc. as needed