wp-cli icon indicating copy to clipboard operation
wp-cli copied to clipboard

Purge git history for irrelevant files from all repos

Open schlessera opened this issue 4 years ago • 2 comments

WP-CLI started out as a singe repository in git and was split up into multiple packages many years later.

When splitting up the packages, we tried to make sure that no historical knowledge is lost, so each subsplit package started out as the original main package and then removed the files that were not needed.

However, it seems like the purging we did afterward to remove unused history was not thorough enough.

Right now, each package contains the history of all WP-CLI files across all commands up to the point of the split. This makes most statistics and contribution data useless and drastically increases the size of the individual VCS repos. When cloning a full environment, each command repo has about 10MB of historical data of which most is useless.

To solve this, a more thorough purge should be done:

git checkout master
git ls-files > keep-these.txt
git ls-files | while read -r line; do (git log --follow --raw --diff-filter=R --pretty=format:%H "$line" | while true; do if ! read hash; then break; fi; IFS=$'\t' read mode_etc oldname newname; read blankline; echo $oldname; done); done >> keep-these.txt
git filter-branch --force --index-filter  "git rm  --ignore-unmatch --cached -qr . ; cat $PWD/keep-these.txt | tr '\n' '\0' | xargs -d '\0' git reset -q \$GIT_COMMIT --" --prune-empty --tag-name-filter cat -- --all
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --aggressive --prune=now

Then, the master branch needs to be force-pushed to overwrite the current master branch history.

:warning: However, this will cause all open PRs to become invalid and immediately be closed! :warning:

Therefore, this should be worked upon for each repo separately, while first ensuring no PRs are open against the master branch anymore.

TODO:

schlessera avatar Dec 24 '21 01:12 schlessera

@schlessera If you are already doing task requiring 0 open PR's against master then maybe at the same time change master branch to main or trunk?

wojsmol avatar Dec 24 '21 02:12 wojsmol

@wojsmol Yes, agree, this should be done in tandem with #5598

schlessera avatar Jan 05 '22 19:01 schlessera

Prior to purging the history, I'm capturing the code to each open pull request in this manner:

image

If the pull request doesn't have an issue, I'll open that too.

danielbachhuber avatar Nov 18 '22 15:11 danielbachhuber

On Mac, xargs doesn't have -d. The easy workaround is to brew install findutils for gxargs:

git checkout master
du -h -d 0 .git
git ls-files > keep-these.txt
git ls-files | while read -r line; do (git log --follow --raw --diff-filter=R --pretty=format:%H "$line" | while true; do if ! read hash; then break; fi; IFS=$'\t' read mode_etc oldname newname; read blankline; echo $oldname; done); done >> keep-these.txt
git filter-branch --force --index-filter  "git rm  --ignore-unmatch --cached -qr . ; cat $PWD/keep-these.txt | tr '\n' '\0' | gxargs -d '\0' git reset -q \$GIT_COMMIT --" --prune-empty --tag-name-filter cat -- --all
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --aggressive --prune=now
du -h -d 0 .git

The before and after of du -h -d 0 .git should decrease substantially in size.

Once you're triple-sure of your changes, run git push -f origin master. You might need to enable force pushes beforehand.

danielbachhuber avatar Nov 18 '22 15:11 danielbachhuber

🚢

danielbachhuber avatar Nov 18 '22 17:11 danielbachhuber