Malicious Git Servers

  • coding
  • git
  • tools
  • security
  • deployment

Git is hard

Some time ago, I found myself in a debate on irc regarding the security of git. On one side, my opponent argued that you could not trust git to reliably give you a particular version of your code in light of a malicious remote or a man in the middle attack, even if you checked it out by a particular hash. I argued that because of the nature of how git stores revisions, even a change in the history of the repository would require breaking the security of the SHA1 hashes git uses, an unlikely event. We eventually came to agreement that if you get code via revision hashes and not via branches or unsigned tags, you are not vulnerable to the kind of attach he was proposing.

This got me thinking about the security of git. About how it stores objects and builds a working directory. What if the contents of one of the object files changed? Git makes these files read only on the file system to prevent this kind of problem, but that is a weak protection. If the other end of your git clone is malicious, how much damage could they do? If there really is a security problem here, it means that a lot of deployment tools that rely on git telling the truth are vulnerable.

The malicious git

So I did an experiment. I created a repository in ~/tmp/malice/a. I checked in two files good.txt and evil.txt. I put the words "good" and "evil" in them, respectively. I commited, and it was good. For a sanity check, I cloned that repository to ~/tmp/malice/b. Everything looked as I expected. I deleted the clone, and started fiddling with git's internals.

So I did an experiment. I created a repository in ~/tmp/malice/a. I checked in a files good.txt, and put word "good" in it. I commited, and it was good. For a sanity check, I cloned that repository to ~/tmp/malice/b. Everything looked as I expected. I deleted the clone, and started fiddling with git's internals.

My first thought was to modify the object file that represented the tree, to try and replace the file with another one. Unfortunately, git's objects files aren't packed in a human readable way, so this didn't work out. After some more thought, I decided I could just modify the object file representing good.txt directly. Surely those are stored in a human readable way.

Nope. File blobs are equally unreadable. It seems the only thing within my reach that could deal with them was git itself. Hmm. Do file blobs only depend on the file's content? I checked in another throw away repository. I made another good.txt with the same contents, and commited it. The hash was the same. This was what I need to test out my theory of malice! So I made a second file, evil.txt, and checked it in to the throwaway repository.

I took the contents of the object file for evil.txt and replaced the object file for good.txt with them. The original repository still was unaware of my treachery: git status said all was well. Mischief managed!

What does the git think?

Next I cloned the modified repository. Alarmingly, no red flags were raised, and the exit code was 0. Opening good.txt revealed that the treachery had worked. It contained the text "evil", just like evil.txt. Uh oh. Surely git knows something is wrong right? I ran git status in the cloned repository. modified: good.txt. Well, that is a start. But the return code was still 0. That means that git status can't help in our deploy scripts. Out of curiosity, I ran git diff, to see what git things was modified in the file. Nothing. Which makes sense. Git knows something is up because the hash of good.txt doesn't match it's object id, but the contents match up, so it can't tell any more.

This is worrisome. To protect against malicious server or MITM attacks, there needs to be an automated way to detect this treachery. I looked around in git help. Nothing obvious. I delved deeper. I wondered if git gc would notice something? Nope. Status and Clone are already out. Repack? No dice. I started getting very worried by this point.

The solution

Then I found the command I needed: git fsck. It does just what you would expect it to. The name comes from the system utility by the same name, and it originally stood for "File System Check". After finding this command, I had hope. I ran it. It didn't light up in big flashing lights, but reading it's output revealed "error: sha1 mismatch 12799ccbe7ce445b11b7bd4833bcc2c2ce1b48b7". More importantly, the exit code of the command was 5. I don't know what 5 means, but I do know it isn't 0, so it is an error. Yes!

So the solution is to always check git fsck after cloning if you really must know that your code is what you intended to run. If you do not, you run the risk of getting code that could be entirely different from what you thought.

A small comfort

Someone pointed that the various remote protocols git used would probably be a little pickier about what it got, in case of network transmission errors. Luckily, this was true. I tested http, git, and ssh protocols, and each of them raised an error on clone:

Cloning into './c'...
error: File 12799ccbe7ce445b11b7bd4833bcc2c2ce1b48b7 has bad hash
fatal: missing blob object '12799ccbe7ce445b11b7bd4833bcc2c2ce1b48b7'
fatal: remote did not send all necessary objects
Unexpected end of command stream

The particular output varied a little with each protocol, but the result was the same. An error in the output, return code 128, and no repository cloned. This is good.

I feel that this is something that was improved recently, because when I originally did this experiment I remember the remote protocols printed an error, but did not have a non-zero exit code and still created the repository. Unfortunately I did not document this, so I'm not sure. Yay for continuous improvement and poor memories, I guess.

Conclusion

If you are cloning from an untrusted git server, and especially if you are cloning from an untrusted repository via the file protocol, run git fsck afterwards and check the error code, to make everything is at it should be.