Common questions and answers about using git-filter-repo.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/newren/git-filter-repo/llms.txt
Use this file to discover all available pages before exploring further.
Why did git-filter-repo rewrite commit hashes?
Why did git-filter-repo rewrite commit hashes?
Why did git-filter-repo rewrite more commit hashes than I expected?
Why did git-filter-repo rewrite more commit hashes than I expected?
- Why did commits newer than the ones I expected have their hash change?
- Why did commits older than the ones I expected have their hash change?
- If you have signed commits, the signatures will be stripped
- If you have commits with extended headers, the extended headers will be stripped (signed commits are actually a special case of this)
- If you have commits in an encoding other than UTF-8, they will by default be re-encoded into UTF-8
- If you have a commit without an author, one will be added that matches the committer
- If you have trees that are not canonical (e.g. incorrect sorting order), they will be canonicalized
--refs argument to git-filter-repo to specify a range of history that you want rewritten.Why did git-filter-repo rewrite other branches too?
Why did git-filter-repo rewrite other branches too?
git-filter-repo can restrict its rewriting to a subset of history, such as a single branch, using the --refs option. However, using that comes with the risk that one branch now has a different version of some commits than other branches do; usually, when you rewrite history, you want all branches that depend on what you are rewriting to be updated.How should paths be specified?
How should paths be specified?
--path should be paths as Git would report them, when run from the toplevel of the git repository.Good path examples:README.mdDocumentation/README.mdsrc/modules/flux/capacitor.rs
git diff --no-relative --name-only or git log --no-relative --name-only --format="".The following are basic rules about paths the way that Git reports and uses them:- do not use absolute paths
- always treats paths as relative to the toplevel of the repository (do not add a leading slash, and do not specify paths relative to some subdirectory of the repository even if that is your current working directory)
- do not use the special directories
.or..anywhere in your path - do not use
\, the Windows path separator, between directories and files; always use/regardless of platform
/absolute/path/to/src/modules/program.c/src/modules/program.csrc/docs/../modules/main.javascripts/config/./update.sh./tests/fixtures/image.jpg../src/main.rsC:\absolute\path\to\src\modules\program.csrc\modules\program.c
Help! Can I recover or undo the filtering?
Help! Can I recover or undo the filtering?
--force, you would have seen the following warning:--force, well, you were warned.If you didn’t make a fresh clone, and you started with --force, and you didn’t think to read the description of the --force option:--force in it on something you don’t have a backup of, then now is the time to reassess your life choices. --force should be a pretty clear warning sign.Can you change git-filter-repo to allow future folks to recover from --force'd rewrites?
Can you change git-filter-repo to allow future folks to recover from --force'd rewrites?
- Providing an alternate method to restore would require storing both the original history and the new history, meaning that those who are trying to shrink their repository size instead see it grow and have to figure out extra steps to expunge the old history to see the actual size savings. Experience with other tools showed that this was frustrating and difficult to figure out for many users.
- Providing an alternate method to restore would mean that users who are trying to purge sensitive data from their repository still find the sensitive data after the rewrite because it hasn’t actually been purged. In order to actually purge it, they have to take extra steps. Same as with the last bullet point, experience has shown that extra steps to purge the extra information is difficult and error-prone. This extra difficulty is particularly problematic when you’re trying to expunge sensitive data.
- Providing an alternate method to restore would also mean trying to figure out what should be backed up and how. The obvious choices used by previous tools only actually provided partial backups (reflogs would be ignored for example, as would uncommitted changes whether staged or not). The more you try to carefully backup everything, the more difficult the restoration from backup will be. The only backup mechanism I’ve found that seems reasonable, is making a separate clone. That’s expensive to do automatically for the user (especially if the filtering is done via multiple invocations of the tool). Plus, it’s not clear where the clone should be stored, especially to avoid the previous problems for size-reduction and sensitive-data-removal folks.
- Providing an alternate method to restore would also mean providing documentation on how to restore. Past methods by other tools in the history rewriting space suggested that it was rather difficult for users to figure out. Difficult enough, in fact, that users simply didn’t ever use them. They instead made a separate clone before rewriting history and if they didn’t like the rewrite, then they just blew it away and made a new clone to work with. Since that was observed to be the easy restoration method, I simply enforced it with this tool, requiring users who look like they might not be operating on a fresh clone to use the —force flag.
Can I use git-filter-repo to fix a repository with corruption?
Can I use git-filter-repo to fix a repository with corruption?
git replace. If git fsck reports warnings/errors for certain objects, you can often replace them and rewrite history.Can I filter history but keep the same commit IDs?
Can I filter history but keep the same commit IDs?
Can I do bidirectional development between a filtered and unfiltered repository?
Can I do bidirectional development between a filtered and unfiltered repository?
git-filter-repo did.Such a tool exists; it’s called Josh. Use it if this is your usecase.Can I remove specific commits, or filter based on the difference between commits?
Can I remove specific commits, or filter based on the difference between commits?
git rebase. git rebase operates on the difference between commits (“diff”), allowing you to e.g. drop or modify the diff, but then runs the risk of conflicts as it attempts to apply future diffs. If you tweak one diff in the middle, since it just applies more diffs for the remaining patches, you’ll still see your changes at the end.filter-repo, by contrast, uses fast-export and fast-import. Those tools treat every commit not as a diff but as a “use the same versions of most files from the parent commit, but make these five files have these exact contents”. Since you don’t have either the diff or ready access to the version of files from the parent commit, that makes it hard to “undo” part of the changes to some file.In short, git rebase is the tool you want for removing specific commits or otherwise operating on the diff between commits.Will filtering two different clones of the same repository give the same new commit IDs?
Will filtering two different clones of the same repository give the same new commit IDs?
git-filter-repo command, and they expect to get the same new commit IDs. Often they do get the same new commit IDs, but sometimes they don’t.When people get the same commit IDs, it is only by luck; not by design. There are three reasons this is unsupported and will never be reliable:-
Different Git versions used could cause differences in filtering - Since
git fast-exportandgit fast-importdo various canonicalizations of history, and these could change over time, having different versions of Git installed can result in differences in filtering. -
Different git-filter-repo versions used could cause differences in filtering - Over time,
git-filter-repomay include new filterings by default, or fix existing filterings, or make any other number of changes. As such, having different versions ofgit-filter-repoinstalled can result in differences in filtering. - Different amounts of the repository cloned or differences in local-only commits can cause differences in filtering - If the clones weren’t made at the same time, one clone may have more commits than the other. Also, both may have made local commits the other doesn’t have. These additional commits could cause history to be traversed in a different order, and filtering rules are allowed to have order-dependent rules for how they filter.
git-filter-repo is designed as a one-shot history rewriting tool. Once you have filtered one clone of the repository, you should not be using it to filter other clones. All other clones of the repository should either be discarded and recloned, or have all their history rebased on top of the rewritten history.