Rewriting Git History With Confidence: A Guide

Written by omerosenbaum | Published 2023/04/27
Tech Story Tags: programming | git | git-history | undo-git-commit | guide | programming-tutorial | programming-tips | hackernoon-top-story

TLDRGit is a system for recording snapshots of a filesystem in time. A Git repository has three “states” or “trees”: the index, the staging area and the working tree. A working dir(ectrory) is any directory on our file system which has a Git repo associated with it.via the TL;DR App

As a developer, you work with Git all the time.

Did you ever get to a point where you said: “Uh-oh, what did I just do?”

This post will give you the tools to rewrite history with confidence.

Notes Before We Start

  1. I also gave a live talk covering the contents of this post. If you prefer a video (or wish to watch it alongside reading) — you can find it here.

  2. I am working on a book about Git! Are you interested in reading the initial versions and providing feedback? Send me an email: [email protected]

Recording Changes in Git

Before understanding how to undo things in Git, you should first understand how we record changes in Git. If you already know all the terms, feel free to skip this part.

It is very useful to think about Git as a system for recording snapshots of a filesystem in time. Considering a Git repository, it has three “states” or “trees”:

Usually, when we work on our source code, we work from a working dir. A working dir(ectrory) (or working tree) is any directory in our file system that has a repository associated with it.

It contains the folders and files of our project and also a directory called .git. I described the contents of the .git folder in more detail in a previous post.

After you make some changes, you may want to record them in your repository. A repository (in short: repo) is a collection of commits, each of which is an archive of what the project’s working tree looked like at a past date, whether on your machine or someone else’s.

repository also includes things other than our code files, such as HEAD, branches, etc.

In between, we have the index or the staging area; these two terms are interchangeable. When we checkout a branch, Git populates the index with all the file contents that were last checked out into our working directory and what they looked like when they were originally checked out.

When we use git commit, the commit is created based on the state of the index.

So, the index, or the staging area, is your playground for the next commit. You can work and do whatever you want with the index, add files to it, remove things from it, and then only when you are ready, you go ahead and commit to the repository.

Time to get hands-on 🙌🏻

Use git init to initialize a new repository. Write some text into a file called 1.txt:

Out of the three tree states described above, where is 1.txt now?

In the working tree, as it hasn’t yet been introduced to the index.

In order to stage it, to add it to the index, use git add 1.txt.

Now, we can use git commit to commit our changes to the repository.

You created a new commit object, which includes a pointer to a tree describing the entire working tree. In this case, it’s gonna be only 1.txtwithin the root folder. In addition to a pointer to the tree, the commit object includes metadata, such as timestamps and author information.

For more information about the objects in Git (such as commits and trees), check out my previous post.

(Yes, “check out”, pun intended 😇)

Git also tells us the SHA-1 value of this commit object. In my case, it was c49f4ba (which are only the first 7 characters of the SHA-1 value, to save some space).

If you run this command on your machine, you would get a different SHA-1 value, as you are a different author; also, you would create the commit on a different timestamp.

When we initialize the repo, Git creates a new branch (named main by default). And a branch in Git is just a named reference to a commit. So by default, you have only the main branch. What happens if you have multiple branches? How does Git know which branch is the active branch?

Git has another pointer called HEAD, which points (usually) to a branch, which then points to a commit. By the way, under the hood, HEAD is just a file. It includes the name of the branch with some prefixes.

Time to introduce more changes to the repo!

Now, I want to create another one. So let’s create a new file, and add it to the index, as before:

Now, it’s time to use git commit. Importantly, git commit does two things:

First, it creates a commit object, so there is an object within Git’s internal object database with a corresponding SHA-1 value. This new commit object also points to the parent commit. That is the commit that HEAD was pointing to when you wrote the git commit command.

Second, git commit moves the pointer of the active branch — in our case, that would be main, to point to the newly created commit object.

Undoing the Changes

To rewrite history, let’s start with undoing the process of introducing a commit. For that, we will get to know the command git reset, a super powerful tool.

git reset --soft

So the very last step you did before was to git commit, which actually means two things — Git created a commit object and moved main, the active branch. To undo this step, use the command git reset --soft HEAD~1.

The syntax HEAD~1 refers to the first parent of HEAD. If I had more than one commit in the commit-graph, say “Commit 3” pointing to “Commit 2”, which is, in turn, pointing to “Commit 1”.

And sayHEAD was pointing to “Commit 3”. You could use HEAD~1 to refer to “Commit 2”, and HEAD~2 would refer to “Commit 1”.

So, back to the command: git reset --soft HEAD~1

This command asks Git to change whatever HEAD is pointing to. (Note: In the diagrams below, I use *HEAD for “whatever HEAD is pointing to”). In our example, HEAD is pointing to main. So Git will only change the pointer of main to point to HEAD~1. That is, main will point to “Commit 1”.

However, this command did not affect the state of the index or the working tree. So if you use git status you will see that 2.txt is staged, just like before you ran git commit .

What about git log? It will start from HEAD , go to main, and then to “Commit 1”. Notice that this means that “Commit 2” is no longer reachable from our history.

Does that mean the commit object of “Commit 2” is deleted? 🤔

No, it’s not deleted. It still resides within Git’s internal object database of objects.

If you push the current history now, by using git push, Git will not push “Commit 2” to the remote server, but the commit object still exists on your local copy of the repository.

Now, commit again — and use the commit message of “Commit 2.1” to differentiate this new object from the original “Commit 2”:

Why are “Commit 2” and “Commit 2.1” different? Even if we used the same commit message, and even though they point to the same tree object (of the root folder consisting of 1.txt and 2.txt ), they still have different timestamps, as they were created at different times.

In the drawing above, I kept “Commit 2” to remind you that it still exists in Git’s internal object database. Both “Commit 2” and “Commit 2.1” now point to “Commit 1", but only “Commit 2.1” is reachable from HEAD.

Git Reset --Mixed

It’s time to go even backward and undo further. This time, use git reset --mixed HEAD~1 (note: --mixed is the default switch for git reset).

This command starts the same as git reset --soft HEAD~1. Meaning it takes the pointer of whatever HEAD is pointing to now, which is the main branch, and sets it to HEAD~1, in our example — “Commit 1”.

Next, Git goes further, effectively undoing the changes we made to the index. That is, changing the index so that it matches with the current HEAD, the new HEAD after setting it in the first step.

If we ran git reset --mixed HEAD~1 , it means HEAD would be set to HEAD~1 (“Commit 1”), and then Git would match the index to the state of “Commit 1” — in this case, it means that 2.txt will no longer be part of the index.

It’s time to create a new commit with the state of the original “Commit 2”. This time we need to stage 2.txt again before creating it:

Git Reset --Hard

Go on, undo even more!

Go ahead and run git reset --hard HEAD~1

Again, Git starts with the --soft stage, setting whatever HEAD is pointing to (main), to HEAD~1 (“Commit 1”).

So far so good.

Next, moving on to the --mixed stage, matching the index with HEAD. That is, Git undoes the staging of 2.txt.

It is time for the --hard step where Git goes even further and matches the working dir with the stage of the index. In this case, it means removing 2.txt also from the working dir.

(**Note: In this specific case, the file is untracked, so it won’t be deleted from the file system; it isn’t really important in order to understand git reset though).

So to introduce a change to Git, you have three steps. You change the working dir, the index, or the staging area, and then you commit a new snapshot with those changes. To undo these changes:

  • If we use git reset --soft, we undo the commit step.

  • If we use git reset --mixed, we also undo the staging step.

  • If we use git reset --hard, we undo the changes to the working dir.

Real-Life Scenarios!

Scenario #1

So in a real-life scenario, write “I love Git” into a file ( love.txt ), as we all love Git 😍. Go ahead, stage and commit this as well:

Oh, oops!

Actually, I didn’t want you to commit it.

What I actually wanted you to do is write some more love words in this file before committing it.

What can you do?

Well, one way to overcome this would be to use git reset --mixed HEAD~1, effectively undoing both the committing and the staging actions you took:

So main points to “Commit 1” again, and love.txt is no longer a part of the index. However, the file remains in the working dir. You can now go ahead, and add more content to it:

Go ahead, stage and commit your file:

Well done 👏🏻

You got this clear, nice history of “Commit 2.4” pointing to “Commit 1”.

We now have a new tool in our toolbox, git reset 💪🏻

This tool is super, super useful, and you can accomplish almost anything with it. It’s not always the most convenient tool to use, but it’s capable of solving almost any rewriting-history scenario if you use it carefully.

For beginners, I recommend using only git reset for almost any time you want to undo in Git. Once you feel comfortable with it, it’s time to move on to other tools.

Scenario #2

Let us consider another case.

Create a new file called new.txt; stage and commit:

Oops. Actually, that’s a mistake. You were on main, and I wanted you to create this commit on a feature branch. My bad 😇

There are two most important tools I want you to take from this post. The second is git reset. The first and by far more important one is to whiteboard the current state versus the state you want to be in.

For this scenario, the current state and the desired state look like so:

You will notice three changes:

  1. main points to “Commit 3” (the blue one) in the current state, but to “Commit 2.4” in the desired state.

  2. feature branch doesn’t exist in the current state, yet it exists and points to “Commit 3” in the desired state.

  3. HEAD points to main in the current state, and to feature in the desired state.

If you can draw this and you know how to use git reset, you can definitely get yourself out of this situation.

So again, the most important thing is to take a breath and draw this out.

Observing the drawing above, how do we get from the current state to the desired one?

There are a few different ways of course, but I will present one option only for each scenario. Feel free to play around with other options as well.

You can start by using git reset --soft HEAD~1. This would set main to point to the previous commit, “Commit 2.4”:

Peeking at the current-vs-desired diagram again, you can see that you need a new branch, right? You can use git switch -c feature for it or git checkout -b feature (which does the same thing):

This command also updates HEAD to point to the new branch.

Since you used git reset --soft, you didn’t change the index, so it currently has exactly the state you want to commit — how convenient! You can simply commit to feature branch:

And you got to the desired state 🎉

Scenario #3

Ready to apply your knowledge to additional cases?

Add some changes to love.txt, and also create a new file called cool.txt. Stage them and commit:

Oh, oops, actually I wanted you to create two separate commits, one with each change 🤦🏻

Want to try this one yourself?

You can undo the committing and staging steps:

Following this command, the index no longer includes those two changes, but they’re both still in your file system. So now, if you only stage love.txt , you can commit it separately, and then do the same for cool.txt:

Nice 😎

Scenario #4

Create a new file (new_file.txt) with some text, and add some text to love.txt. Stage both changes, and commit them:

Oops 🙈🙈

So this time, I wanted it to be on another branch, but not a new branch, rather an already-existing branch.

So what can you do?

I’ll give you a hint. The answer is really short and really easy. What do we do first?

No, not reset. We draw. That’s the first thing to do, as it would make everything else so much easier. So this is the current state:

And the desired state?

How do you get from the current state to the desired state, what would be easiest?

So one way would be to use git resetas you did before, but there is another way that I would like you to try.

First, move HEAD to point to existing branch:

Intuitively, what you want to do is take the changes introduced in the blue commit, and apply these changes (“copy-paste”) on top of existing branch. And Git has a tool just for that.

To ask Git to take the changes introduced between this commit and its parent commit and just apply these changes on the active branch, you can use git cherry-pick. This command takes the changes introduced in the specified revision and applies them to the active commit.

It also creates a new commit object, and updates the active branch to point to this new object.

In the example above, I specified the SHA-1 identifier of the created commit, but you could also use git cherry-pick main, as the commit whose changes we are applying is the one main is pointing to.

But we don’t want these changes to exist on main branch. git cherry-pick only applied the changes to the existing branch. How can you remove them from main?

One way would be to switch back to main, and then use git reset --hard HEAD~1:

You did it! 💪🏻

Note that git cherry-pick actually computes the difference between the specified commit and its parent, and then applies them to the active commit. This means that sometimes, Git won’t be able to apply those changes as you may get a conflict, but that’s a topic for another post.

Also, note that you can ask Git to cherry-pick the changes introduced in any commit, not only commits referenced by a branch.

We have acquired a new tool, so we have git reset as well as git cherry-pick under our belt.

Scenario #5

Okay, so another day, another repo, another problem.

Create a commit:

And push it to the remote server:

Um, oops 😓…

I just noticed something. There is a typo there. I wrote This is more tezt instead of This is more text. Whoops. So what’s the big problem now? I pushed, which means that someone else might have already pulled those changes.

If I override those changes by using git reset, as we’ve done so far, we will have different histories, and all hell might break loose. You can rewrite your own copy of the repo as much as you like until you push it.

Once you push the change, you need to be very certain no one else has fetched those changes if you are going to rewrite history.

Alternatively, you can use another tool called git revert. This command takes the commit you’re providing it with and computes the Diff from its parent commit, just like git cherry-pick, but this time, it computes the reverse changes.

So if in the specified commit you added a line, the reverse would delete the line, and vice versa.

git revert created a new commit object, which means it’s an addition to the history. By using git revert, you didn’t rewrite history. You admitted your past mistake, and this commit is an acknowledgment that you made a mistake and now you fixed it.

Some would say it’s the more mature way. Some would say it’s not as clean a history as you would get if you used git reset to rewrite the previous commit. But this is a way to avoid rewriting history.

You can now fix the typo and commit again:

Your toolbox is now loaded with a new shiny tool, revert:

Scenario #6

Get some work done, write some code, and add it to love.txt . Stage this change, and commit it:

I did the same on my machine, and I used the Up arrow key on my keyboard to scroll back to previous commands, and then I hit Enter, and… Wow.

Whoops.

Did I just use git reset --hard? 😨

What actually happened? Git moved the pointer to HEAD~1, so the last commit, with all of my precious work, is not reachable from the current history. Git also unstaged all the changes from the staging area, and then matched the working dir to the state of the staging area.

That is, everything matches this state where my work is… gone.

Freak out time. Freaking out.

But, really, is there a reason to freak out? Not really… We’re relaxed people. What do we do? Well, intuitively, is the commit really, really gone? No. Why not? It still exists inside the internal database of Git.

If I only knew where that is, I would know the SHA-1 value that identifies this commit, we could restore it. I could even undo the undoing, and reset back to this commit.

So the only thing I really need here is the SHA-1 of the “deleted” commit.

So the question is, how do I find it? Would git log be useful?

Well, not really. git log would go to HEAD, which points to main, which points to the parent commit of the commit we are looking for. Then, git log would trace back through the parent chain, which does not include the commit with my precious work.

Thankfully, the very smart people who created Git also created a backup plan for us, and that is called the reflog.

While you work with Git, whenever you change HEAD, which you can do by using git reset, but also other commands like git switch or git checkout, Git adds an entry to the reflog.

We found our commit! It’s the one starting with 0fb929e .

We can also relate to it by its “nickname” — [email protected]{1}. So such as Git uses HEAD~1 to get to the first parent of HEAD, and HEAD~2 to refer to the second parent of HEAD and so on, Git uses [email protected]{1} to refer to the first reflog parent of HEAD, where HEAD pointed to in the previous step.

We can also ask git rev-parse to show us its value:

Another way to view the reflog is by using git log -g, which asks git log to actually consider the reflog :

We see above that the reflog, just as HEAD, points to main, which points to “Commit 2”. But the parent of that entry in the reflog points to “Commit 3”.

So to get back to “Commit 3”, you can just use git reset --hard [email protected]{1} (or the SHA-1 value of “Commit 3”):

And now, if we git log:

We saved the day! 🎉👏🏻

What would happen if I used this command again? And ran git commit --reset [email protected]{1}? Git would set HEAD to where HEAD was pointing before the last reset, meaning to “Commit 2”. We can keep going all day:

Looking at our toolbox now, it’s loaded with tools that can help you solve many cases where things go wrong in Git:

With these tools, you now better understand how Git works. There are more tools that would allow you to rewrite history specifically, git rebase), but you’ve already learned a lot in this post. In future posts, I will dive into git rebase as well.

The most important tool, even more important than the five tools listed in this toolbox, is to whiteboard the current situation vs the desired one. Trust me on this, it will make every situation seem less daunting and the solution more clear.

Learn More About Git

I also gave a live talk covering the contents of this post. If you prefer a video (or wish to watch it alongside reading) — you can find it here.

In general, my YouTube channel covers many aspects of Git and its internals; you are welcomed to check it out (pun intended 😇)

About the Author

Omer Rosenbaum is the CTO and Co-Founder of Swimm, a devtool that helps developers and their teams manage knowledge about their codebase with up-to-date internal documentation. Omer is the founder of Check Point Security Academy and was the Cyber Security Lead at ITC, an educational organization that trains talented professionals to develop careers in technology.

Omer has a MA in Linguistics from Tel Aviv University and is the creator of the Brief YouTube Channel.


First published here


Written by omerosenbaum | Co-Founder and Chief Technology Officer at Swimm. Cyber training expert and Founder of Checkpoint Security Academy.
Published by HackerNoon on 2023/04/27