8
votes

Preface

This question attempts to clear the confusion regarding applying .gitignore retroactively, not just to the present/future.1

Rationale

I've been searching for a way to make my current .gitignore be retroactively enforced, as if I had created .gitignore in the first commit.

The solution I am seeking:

  • Will not require manually specifying files
  • Will not require a commit
  • Will apply retroactively to all commits of all branches
  • Will ignore .gitignore-specified files in working dir, not delete them (just like an originally root-committed .gitignore file would)
  • Will use git, not BFG
  • Will apply to .gitignore exceptions like:
 *.ext
 !*special.ext

Not solutions

git rm --cached *.ext
git commit

This requires 1. manually specifying files and 2. an additional commit, which will result in newly-ignored file deletion when pulled by other developers. (It is effectively just a git rm - which is a deletion from git tracking - but it leaves the file alone in the local (your) working directory. Others who git pull afterwards will receive the file deletion commit)

git filter-branch --index-filter 'git rm --cached *.ext'

While this does purge files retroactively, it 1. requires manually specifying files and 2. deletes the specified files from the local working directory just like plain git rm (and so also for others who git pull)!


Footnotes

1There are many similar posts here on SO, with less-than-specifically-defined questions and even more less-than-accurate answers. See this question with 23 answers where the accepted answer with ~4k votes is incorrect according to the standard definition of "forget" as noted by one mostly-correct answer, and only 2 answers include the required git filter-branch command.

This question with 21 answers is was marked as a duplicate of the previous one, but the question is defined differently (ignore vs forget), so while the answers may be appropriate, it is not a duplicate.

This question is the closest I've found to what I'm looking for, but the answers don't work in all cases (paths with spaces...) and perhaps are a bit more complex than necessary regarding creating an external-to-repository .gitignore file and copying it into every commit.

1
Sometimes it's just better to write a script to do the manual things for you.britho
Is your goal to rewrite the repository to how it would look if the files in question were never committed (which would invalidate all existing commit IDs, and probably break things for every existing clone/checkout of the repo), or to configure your local working directory such that Git pretends those files are not present in an old commit when you check it out?hmakholm left over Monica
Goal is the former, "as if I had created .gitignore at the beginning". I understand the ramifications, but my repo is local/private and I don't mind a force-push. Although feel free to specify how to handle the latter if you answer - seems it would be useful information.goofology
It's going to be like a five-line filter-branch, tops. Put your exclusions in .git/info/exclude, do a git ls-files --exclude-standard -ci and rm --cached them.jthill
Thank you. I agree that forget=retroactively, and would have no need to specify it explicitly, if not for the other incredibly upvoted “completely forget” question with an accepted answer that only applies to the present/future. Perhaps that question should be edited to be more explicit (present/future only) as well?goofology

1 Answers

10
votes

EDIT: I've recently found git-filter-repo. It may be a better choice. Perhaps a good idea to investigate the rationale and filter-branch gotchas for yourself, but they wouldn't have affected my use-case below.


This method makes Git completely forget ignored files (past/present/future), but does not delete anything from working directory (even when re-pulled from remote).

This method requires usage of /.git/info/exclude (preferred) OR a pre-existing .gitignore in all the commits that have files to be ignored/forgotten. 1

This method avoids removing the newly-ignored files from other developers machines on the next git pull 2

All methods of enforcing Git ignore behavior after-the-fact effectively re-write history and thus have significant ramifications for any public/shared/collaborative repos that might be pulled after this process. 3

General advice: start with a clean repo - everything committed, nothing pending in working directory or index, and make a backup!

Also, the comments/revision history of this answer (and revision history of this question) may be useful/enlightening.

#commit up-to-date .gitignore (if not already existing)
#these commands must be run on each branch
#these commands are not strictly necessary if you don't want/need a .gitignore file.  .git/info/exclude can be used instead

git add .gitignore
git commit -m "Create .gitignore"

#apply standard git ignore behavior only to current index, not working directory (--cached)
#if this command returns nothing, ensure /.git/info/exclude AND/OR .gitignore exist
#this command must be run on each branch
#if using .git/info/exclude, it will need to be modified per branch run, if the branches have differing (per-branch) .gitignore requirements.

git ls-files -z --ignored --exclude-standard | xargs -r0 git rm --cached

#Commit to prevent working directory data loss!
#this commit will be automatically deleted by the --prune-empty flag in the following command
#this command must be run on each branch
#optionally use the --amend flag to merge this commit with the previous one instead of creating 2 commits.

git commit -m "ignored index"

#Apply standard git ignore behavior RETROACTIVELY to all commits from all branches (--all)
#This step WILL delete ignored files from working directory UNLESS they have been dereferenced from the index by the commit above
#This step will also delete any "empty" commits.  If deliberate "empty" commits should be kept, remove --prune-empty and instead run git reset HEAD^ immediately after this command

git filter-branch --tree-filter 'git ls-files -z --ignored --exclude-standard | xargs -r0 git rm -f --ignore-unmatch' --prune-empty --tag-name-filter cat -- --all

#List all still-existing files that are now ignored properly
#if this command returns nothing, it's time to restore from backup and start over
#this command must be run on each branch

git ls-files --other --ignored --exclude-standard

Finally, follow the rest of this GitHub guide (starting at step 6) which includes important warnings/information about the commands below.

git push origin --force --all
git push origin --force --tags
git for-each-ref --format="delete %(refname)" refs/original | git update-ref --stdin
git reflog expire --expire=now --all
git gc --prune=now

Other devs that pull from now-modified remote repo should make a backup and then:

#fetch modified remote

git fetch --all

#"Pull" changes WITHOUT deleting newly-ignored files from working directory
#This will overwrite local tracked files with remote - ensure any local modifications are backed-up/stashed

git reset FETCH_HEAD

Footnotes

1 Because /.git/info/exclude can be applied to all historical commits using the instructions above, perhaps details about getting a .gitignore file into the historical commit(s) that need it is beyond the scope of this answer. I wanted a proper .gitignore to be in the root commit, as if it was the first thing I did. Others may not care since /.git/info/exclude can accomplish the same thing regardless where the .gitignore exists in the commit history, and clearly re-writing history is a very touchy subject, even when aware of the ramifications.

FWIW, potential methods may include git rebase or a git filter-branch that copies an external .gitignore into each commit, like the answers to this question

2 Enforcing git ignore behavior after-the-fact by committing the results of a standalone git rm --cached command may result in newly-ignored file deletion in future pulls from the force-pushed remote. The --prune-empty flag in the git filter-branch command (or git reset HEAD^ afterwards) avoids this problem by automatically removing the previous "delete all ignored files" index-only commit.

3 Re-writing git history also changes commit hashes, which will wreak havoc on future pulls from public/shared/collaborative repos. Please understand the ramifications fully before doing this to such a repo. This GitHub guide specifies the following:

Tell your collaborators to rebase, not merge, any branches they created off of your old (tainted) repository history. One merge commit could reintroduce some or all of the tainted history that you just went to the trouble of purging.

Alternative solutions that do not affect the remote repo are git update-index --assume-unchanged </path/file> or git update-index --skip-worktree <file>, examples of which can be found here.