How To Push To Git When Your File Is Too Large?

Last week I tried pushing a new branch to my remote repository and encountered an error message, saying that a certain file I previously committed was too large and exceeded GitHub Enterprise’s file size limit, which is 100MB.

I work on an API based system, which handles CSV files, filled with rows of data, as a part of its requirements. We encountered a file which led the system to behave unusually, and therefore decided to test all flows to track down the root cause of this unwanted behaviour. As a part of testing the system I added the file to the project. After that I made some unrelated changes, committed them and moved on.

After some progress I tried to push to the remote repo which then led to this error message:

I then realised that the file I was testing made its way to my commit, which now prevented me from pushing my local branch to my remote repository branch.

Things I tried

Here are the solution approaches my team and I went through before successfully pushing:

Branch out and delete the file

First thing I did was to branch out in order to experiment on it. After that I went on to safely delete the file and try committing. Same error occurred. My first thought was that there was an issue with the commit, maybe a general synchronization issue within my branch which might’ve lead to git not recognizing the file removal. So I decided to revert my changes. After that there were some attempts to remove the file from both the command line and file system, commit, encounter the same problem and revert.

Develop from scratch

I realized pretty soon this wasn’t the right approach. My next idea was to start developing the feature from scratch, open a whole new branch from dev which wouldn’t include this file to begin with and copy all changes to this new branch. The problem with this approach was that my feature included changing many files, had plenty of dependencies and I didn’t want to risk missing out on any of the changes I made.

Revert the specific commit

The next thing I tried was going over the log to find the specific commit where the file was initially added and revert this specific commit. Once I committed the revert commit I thought this would solve the issue, but it didn’t. As reverting the deletion of the large file itself didn’t work, I tried doing so by deleting the file’s content. My thought was that maybe there was a problem with deleting the file itself, but maybe committing a change that would reduce the file size would fix this and get rid of the file size limit error.

What eventually worked

The main conclusion was that these options didn’t comply with the way git handles commits in a branch.

When I removed the file and committed the change, the file was gone and no longer tracked. But the commit history is there, which means if I had included the large file in any of the previous commits — it’s not gone. The fact that git stores the commit history is amazing, especially when I need to restore changes or recover data that hasn’t been recorded in a snapshot yet. But, the idea that files in commits are frozen forever is crucial to understand, especially when what we’re trying to do is get rid of one particular file which exists somewhere in one of the commits.

Rebase and squash commits

The next approach was rebasing. Since we figured out the file exists in one of the previous commits, we decided to rebase the branch by moving back HEAD to the point right before the commit we wanted to edit. The idea was to reduce all the commits we’ve made and squash them all together, in order to fix the large file’s existence in any one or more of these commits.

We then moved on to viewing all the commits by running git log, used the commit message to identify the hash for the specific commit we needed to edit in order to move back our branch’s HEAD to the point right before this commit, by using git rebase -i HEAD~X (where X is how many commits you need to move your HEAD up, compared to where it is now).

It’s important to use the default action “pick” on the commit we want to keep out of all the existing commits. In this case it would reapply the commit as is, no changes in its contents or message. We used “squash” for all other commits, in order to meld each commit into the previous one.

Conclusion

Trying to push a large file and encounter a size limit error encapsulates much more than just the file size. In this case it’s understanding how git handles commits in a branch.

The way we solved this issue was by understanding that a single commit along the way had the file in it, so we needed to exclude this commit in the chain of commits. Rebase and squash all commits that include the file into a single commit which doesn’t, is our way of recreating history where this file is no longer a part of.

Further reading:

1. https://about.gitlab.com/blog/2018/06/07/keeping-git-commit-history-clean/

2. https://thoughtbot.com/blog/git-interactive-rebase-squash-amend-rewriting-history

3. https://stackoverflow.com/questions/22053757/checkout-another-branch-when-there-are-uncommitted-changes-on-the-current-branch

Image credit:

Cloud photo created by kjpargeter — www.freepik.com