Git
Linus Torvalds wrote git fast enough, in five days, to commit git to git. The first commit contains only plumbing and a README that calls the project 'stupid'.
e83c516331Every commit in this newsletter has the same thing in common: git made it possible. You can run git log --reverse on any repository in the world and see the first thing its creator committed. You can do this because one person, at one keyboard, in April 2005, wrote git fast enough and well enough to bootstrap itself.
This is the commit where that happened.
Let me repeat the message, because it is one of the greatest commit messages ever written and it deserves its own paragraph.
"Initial revision of 'git', the information manager from hell"
1. The context
On April 2, 2005, Linus Torvalds lost access to BitKeeper, the proprietary version control system he had been using to manage the Linux kernel. (A long story involving a reverse-engineered client and a withdrawn free license.) Linux at this point was the largest collaborative software project in human history, and it no longer had source control.
On April 3, Linus started writing a replacement. On April 7, five days later, he made this commit. By April 18, Linux kernel development had migrated onto git. The project that currently powers essentially all collaborative software development on earth was designed, implemented, and put into production by one person, on a deadline, in roughly two weeks.
This commit is day five of that two weeks.
2. What's in it
Eleven files. 1,244 lines of C. No tests. No documentation besides the README. No branches, no merges, no remotes, no git commit, no git add, no git log. None of the commands you use every day exist yet.
What does exist:
Makefile 40 lines
README 168 lines
cache.h 93 lines
cat-file.c 23 lines
commit-tree.c 172 lines
init-db.c 51 lines
read-cache.c 259 lines
read-tree.c 43 lines
show-diff.c 81 lines
update-cache.c 248 lines
write-tree.c 66 lines
Every single one of those .c files is a plumbing command: a low-level, self-contained binary that does one specific thing to the object database. There is no git wrapper that dispatches to subcommands. There is no porcelain at all. If you wanted to commit something, you ran update-cache to stage files, write-tree to snapshot the index, and commit-tree to make a commit object pointing at that tree, piping SHAs around by hand like it was 1978.
The shape of git as we know it was not there yet. But the model was all there, complete, in day-five form: the object database, the trees, the blobs, the content-addressable storage by SHA-1. Every git command you run today is a layer of convenience on top of what these 1,244 lines defined.
3. The README is what you would expect
I was going to paraphrase the README, but no paraphrase can improve on the original. Here it is, verbatim, from the first commit:
"GIT - the stupid content tracker
'git' can mean anything, depending on your mood.
- random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronounciation of 'get' may or may not be relevant. - stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang. - 'global information tracker': you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room. - 'goddamn idiotic truckload of sh*t': when it breaks"
This is the README of software that would go on to be installed on every software engineer's machine on the planet. Linus opens it by listing four definitions of the project's own name, one of which is "goddamn idiotic truckload of sh*t." This is the tone of someone who is so confident in what he has just built that he can afford to be rude about it.
The technical content of the README is also dense and extraordinary. By line 20, it is explaining the difference between blob objects and tree objects. By line 40, it has described the SHA-1 content-addressable store in terms that are still accurate nineteen years later. Linus wrote, in a README, on day five, the final design of the git object model. It has barely changed.
4. What git did not have
It is worth pausing on what this commit does not contain, because the absences are instructive:
- No
gitcommand. Everything is separate binaries.git addwasupdate-cache.git commitwascommit-tree.git diffwasshow-diff. - No branches. The concept did not exist yet. There was just HEAD.
- No remote repositories. git was a local content tracker. Push, pull, clone, fetch: none of these existed. Distribution came later.
- No SHA-256. The object database was SHA-1 from day one. (It still is, mostly. The SHA-256 transition is a slow-motion project that has been in progress for years.)
- No staging area. Well, there is a cache (
read-cache.c,update-cache.c), but the word "index" had not been settled on. The file that would become.git/indexwas called the cache.
The things that are missing are mostly the user interface. The engine was complete.
5. Five days
I keep coming back to the timeline. Five days from "I have no version control" to "I have enough version control to commit the version control itself." Two weeks from "I have no version control" to "the Linux kernel runs on my new version control."
Not every project benefits from being written this fast. Most do not. Most would be better if someone had slept on the design for a month. But git is the answer to the question what does it look like when the person with the deepest possible understanding of what a version control system needs to do sits down and writes one. The answer is: eleven files, a README that calls it "stupid," and a commit message that calls it "the information manager from hell."
And then, because of course, the first thing he uses it for is to commit itself.
6. Footnotes from the commit log
- The README contains a typo: "mispronounciation" (correct: "mispronunciation"). Nobody has ever fixed it. It is still in the git repository today. It is arguably the most-viewed typo in open-source history.
- Linus did not invent content-addressable storage or Merkle trees. He did arrange them into the specific data model that makes git's distributed workflow possible.
- The commit was made to a standalone git repo. There was no "GitHub"; GitHub would not exist for another three years. git/git lived on kernel.org.
- The Makefile is 40 lines. It builds everything. No autotools. No CMake. Just
cc.