Git stores complete history - meaning you would never lose your files, even if they are deleted, they are still available. That, however, presents an issue if large or sensitive files have been committed. Deleting them DOES NOT remove them from history. Recently one of the repos I work on became unexpectedly large, so here's how you can resolve that:
- Check repo files. If there are large files - well, here are your suspects, but if there's nothing large in the repo itself, check the pack files
.git\objects\pack
. In my case, the packfile was ~1gb (99.9% of the repo size). It means that a large object was committed into a repo and then deleted - which still leaves it in the pack files. - Listed all repo objects by size:
git rev-list --objects --all |
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
sed -n 's/^blob //p' |
sort --numeric-sort --key=2 |
cut -c 1-12,41- |
$(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
Which returned one large file:
...
4c23d572ae62 963MiB docker/irishealth-2022.1.0.209.0-docker.tar.gz
Probably IRIS image was accidentally committed (and later deleted) - but was left in the repo history.
- Searched for commits affecting the image file:
git log --all --full-history -- "*irishealth-2022.1.0.209.0*"
Returned:
commit abc
Date: Thu Apr 20 10:47:46 2023
Remove iris image
commit xyz
Date: Thu Apr 20 10:07:12 2023
Fix bug
- Using a GitHub search, I found the affected branch.
- As the commit was in one branch only (not merged into the
main
branch, etc.) and so not deployed toprod
, rewriting history was deemed ok. - Ran pepo cleaner to remove large files from history.
- Force-pushed new history.
- Cloned the repo again, and the operation was completed in under a minute.
After doing this, every developer working on a repo needs to clone the repository again.