PostgreSQL
The first commit of PostgreSQL is from 1996, nine years before git existed. It's labeled 'Virgin Sources' and the copyright still says University of California.
d31084e9d1PostgreSQL is the database that runs a substantial portion of the serious software on the internet. It is usually described as "open source Oracle," but that undersells it. PostgreSQL has features that Oracle does not have, extensions that nothing else can match, and a reputation among the people who use it that approaches religious devotion.
Its git history begins in 1996, nine years before git existed, with a commit labeled "Virgin Sources."
1. Impossible dates
I want to address the date up front, because if you look at git log --reverse on postgres/postgres, you will see this:
commit d31084e9d1118b25fd16580d9d8c2924b5740dff
Author: Marc G. Fournier <scrappy@hub.org>
Date: Tue Jul 9 06:22:35 1996 +0000
Postgres95 1.01 Distribution - Virgin Sources
Git was not released until April 2005. A commit dated 1996 is therefore a conversion from an earlier version control system (in this case, CVS). When the PostgreSQL community migrated from CVS to git around 2010, they preserved the original CVS timestamps so that the historical record would remain intact. The 1996 date is real in the sense that it is the date Marc Fournier actually made this change. It is not real in the sense that git was involved.
The proof is inside the commit itself. Open src/Makefile and you will find this line at the top:
$Header: /cvsroot/pgsql/src/Makefile,v 1.1.1.1 1996/07/09 06:21:07 scrappy Exp $
This is a CVS keyword expansion. CVS would rewrite $Header$ in checked-in files with the file's version, date, author, and repository path. 1.1.1.1 is CVS-specific notation for a vendor branch initial import: a pristine upload of third-party source code, before any local modifications have been made. scrappy is Marc Fournier's CVS username. The CVS repository lived at /cvsroot/pgsql/src/. Every piece of context you would want to reconstruct this commit's origin is sitting in the CVS header string that CVS automatically generated when the file was checked in, and which survived the CVS-to-git migration fourteen years later as a frozen fingerprint.
PostgreSQL's initial git commit is, in other words, a converted CVS vendor-branch import, with the CVS metadata still embedded in the source code. Archaeology buried inside archaeology.
2. "Virgin Sources"
The commit message, "Postgres95 1.01 Distribution - Virgin Sources", is a phrase from CVS culture. In CVS terminology, "virgin sources" meant untouched upstream code, exactly as it arrived from the vendor, with no local modifications. The phrase is a technical term, not a poetic one. When you imported a third-party library into CVS, the convention was to mark that first import as the "virgin" version, so that later diffs against the vendor would be clean.
What Marc was marking with this commit, then, was a precise historical claim: this is Postgres95 version 1.01, exactly as it was released, before I or anyone else on the new team has touched it. He was staking out a known point of reference before the open-source community began making its own changes.
And with that, the git log of one of the most widely deployed databases on earth begins with a phrase that sounds mildly pious and is actually just CVS jargon.
3. Who Marc Fournier is, and why he is the first author
Like Bitcoin, PostgreSQL has a first commit attributed to a person who is not the project's creator.
The creator of Postgres is Michael Stonebraker, who started the project at the University of California, Berkeley, in 1986, ten years before this commit. Stonebraker had previously led INGRES, another influential database at Berkeley; POSTGRES (as it was originally spelled, all-caps) was his next project, built to explore object-relational concepts, extensibility, and rules systems. Throughout the late 1980s and early 1990s, POSTGRES was an academic research project, funded by DARPA and developed by graduate students.
In 1994, two Berkeley students named Andrew Yu and Jolly Chen added SQL support to POSTGRES, replacing its original query language "Postquel." They renamed their version Postgres95, the "95" being a callback to the naming convention of Windows 95 and other mid-1990s software. Postgres95 was the last version directly associated with Berkeley.
In 1996, Berkeley stopped funding the project. It would have died there, as most academic software does, except that a group of enthusiasts on the internet (Marc Fournier, Bruce Momjian, Thomas Lockhart, and Vadim Mikheev) took over stewardship and moved development to a new CVS repository. Marc Fournier, in particular, was the sysadmin. He ran the hub.org server that hosted the mailing lists and the CVS repo. His username was scrappy, because hub.org was his personal machine and "scrappy" was his personal handle. When he set up the new CVS repository for the community-maintained fork, his username became the first committer of what would later become PostgreSQL.
This commit is Marc taking the Postgres95 1.01 tarball (the last thing Andrew Yu and Jolly Chen had released) and importing it into his CVS repository so that a community could start working on it. The commit says, in effect: the academic project is done; the open-source project begins here; and here is exactly what we are beginning from.
Four months after this commit, the project would be renamed PostgreSQL, adding back the "QL" for Query Language and dropping "95." Marc kept running the infrastructure, ran the CVS repository, and remained on the PostgreSQL core team for decades. He died in 2019. His handle, scrappy, is still visible in the git log of every installation of PostgreSQL on earth, attached to the oldest commit in the history.
4. What's in the commit
868 files. 242,656 lines of C. Every single line was written elsewhere: at Berkeley, by Stonebraker's students, in the decade leading up to this commit. Marc did not write any of it. He imported it.
But the structure of the imported code is fascinating, because the structure is still recognizably the structure of PostgreSQL today. The first commit of postgres/postgres contains exactly one directory at the top level of the repository:
src/
No README. No LICENSE. No HISTORY. No INSTALL. No AUTHORS. No CHANGELOG. No CONTRIBUTING. No .gitignore (git did not exist). Just src/, and inside src/:
Makefile
Makefile.global
backend/ <- the database server
bin/ <- psql, pg_dump, and friends
interfaces/ <- libpq and its siblings
mk/ <- the build system
test/
tools/
tutorial/ <- documentation, sort of
Thirty years later, if you clone postgres/postgres, you will find, at src/, all of these directories, still named the same things, still doing the same jobs. src/backend/ is the database server. src/bin/ holds psql, pg_dump, pg_restore, and the other binaries. src/interfaces/libpq/ is the C client library. src/tools/ is where the build helpers live. The PostgreSQL project has spent three decades growing in place, layering on top of the directory structure that Marc Fournier committed in 1996, without ever rearranging the furniture.
This is unusual. Most large codebases have been through at least one big reorganization by their thirtieth birthday. PostgreSQL has not. Every file you touch in modern PostgreSQL has a path that is an unbroken descendant of the paths in this commit. src/backend/access/nbtree/ held the B-tree index access method in 1996 and it holds the B-tree index access method today. The lineage is physical.
5. The missing README
I keep catching myself surprised by this. A 242,656-line distribution and there is no README at the root. No top-level documentation of any kind. If you, in 1996, downloaded Postgres95 1.01, extracted the tarball, and looked in the resulting directory, you would find a single folder called src/ and nothing else. No guidance. No "how to install." No "what is this." Just a directory and the assumption that you already knew what you were doing.
The actual documentation lived one level deeper, inside src/tutorial/. That directory contains a README (at last), a Makefile, and a set of .source files (advanced.source, basics.source, complex.source, funcs.source, syscat.source) which are SQL scripts with pedagogical commentary. These were the original PostgreSQL tutorials: how to create a table, how to run a query, how to define a function. They are still in the repository today, still written as .source files, still run by anyone working through the PostgreSQL tutorial.
But at the project root: nothing. The code was its own documentation, in the way that serious infrastructure software of that era believed it should be. You read the Makefile, you figured out the build, you ran configure, you compiled, you read the source. Postgres95 1.01 assumed you had patience and a C compiler and the ability to figure things out.
A lot has been written in this newsletter about projects whose first commit is a README. PostgreSQL is the opposite. It is the purest example in this series of a project whose first commit is pure source code, with no author's voice at the top level at all. The voice was in the code. The code, having already been written over the previous decade by a dozen graduate students under a famous advisor, did not need a voice.
6. Copyright (c) 1994, Regents of the University of California
There is one more detail from inside this commit I want to pull out, because it is quietly load-bearing for the PostgreSQL license that exists today.
The top of src/Makefile contains this line:
# Copyright (c) 1994, Regents of the University of California
In 1996, the PostgreSQL team was taking over a project whose copyright was still held by the University of California. They did not own the code they were importing. They were stewards of code that had been released under a very permissive Berkeley-style license (what we would now call a BSD license), and that permissive license is what allowed the community takeover to happen at all. If Berkeley had released POSTGRES under the GPL, or under a proprietary license with a research exemption, there would be no PostgreSQL today. The team would not have been able to continue the project outside the academy.
The license that PostgreSQL uses today is called the PostgreSQL License. It is essentially the original Berkeley license, modified only to add a copyright line for the PostgreSQL Global Development Group. It is one of the most permissive open-source licenses in existence: a single paragraph of text that grants you the right to do almost anything you want with the code, including rebranding it and selling it. Amazon RDS, Aurora, Heroku Postgres, Google Cloud SQL, Neon, Supabase: every cloud Postgres product you have ever used exists because a decade earlier, in the mid-1980s, a professor at a public university decided to release his academic database project under a license that would outlive the academy.
This commit is the moment that legal lineage became visible in the open-source world. The copyright header in src/Makefile still names the University of California, because that was the accurate ownership claim in 1996. Later commits would add additional copyright lines (for the PostgreSQL team, for specific contributors) but they would never remove this one. If you grep modern PostgreSQL source code for "Regents of the University of California," you will get hundreds of hits. Every one of them is a trail back to this commit, and through this commit to a decade of graduate students working in Soda Hall in Berkeley, California.
7. The thing that keeps rhyming
I have now written thirteen of these issues, and a pattern has emerged that I want to name directly: most first commits are not first anything. They are snapshots taken in the middle of projects that already existed.
- Next.js began with a spec for a framework not yet built.
- Rails began with an extraction from Basecamp.
- Bitcoin's first commit was eight months after the network launched.
- Kubernetes arrived as a Google-internal dump.
- VS Code arrived as a Microsoft-internal dump.
- Go's git history was backdated to include Kernighan's 1972 "hello, world."
- Node's first commit pointed at Ryan Dahl's two pre-existing C libraries.
- Supabase's first commit was a marketing site for an unbuilt product.
And now PostgreSQL, whose first commit is a 1996 CVS vendor-branch import of a 1994 academic release descended from a 1986 research project. The git log shows you the moment that one version control system handed the project off to the next version control system. It does not show you the moment the project started. The project started ten years before Marc Fournier set up his CVS repo, in a building on the Berkeley campus, and you cannot find that moment anywhere in git, because git did not exist to record it.
What PostgreSQL's first commit does show you is something subtler and maybe more important: the moment an academic project became a community project. The handoff. The day Marc Fournier copied the tarball into CVS and labeled it "Virgin Sources," because the virginity he was marking was the moment before the community started making its own changes: the last time the code would ever be purely Berkeley's. Everything after this commit is the community's. Everything before it is the university's.
PostgreSQL is what happens when an academic project gets released under a permissive license and then gets really lucky with the people who decide to take it over. Most academic software dies. PostgreSQL did not die. This is the commit where it started not dying.
8. Footnotes from the commit log
- Marc Fournier (
scrappy@hub.org) ran thehub.orgserver that hosted PostgreSQL infrastructure for most of the 1990s and 2000s. He was on the PostgreSQL core team from its founding until shortly before his death in 2019. The handlescrappyappears in thousands of early PostgreSQL commits. - The commit contains no
.gitignorebecause git did not exist, but it also contains no.cvsignore, which is interesting because CVS had one. Marc imported the raw tarball without adding CVS-specific files at the time. - The phrase "Virgin Sources" appears in the git log of several other projects that migrated from CVS, and always means the same thing: pristine vendor-branch upstream, untouched by the committer. If you see it in a git log somewhere, you are looking at a CVS refugee.
- Michael Stonebraker, the creator of POSTGRES, won the ACM Turing Award in 2014 "for fundamental contributions to the concepts and practices underlying modern database systems." Postgres95 1.01, the distribution that Marc Fournier imported in this commit, is downstream of the work Stonebraker was cited for.
- The rename from "Postgres95" to "PostgreSQL" happened in November 1996, four months after this commit. The "QL" stands for Query Language. The "95" was dropped because the community wanted a name that would not sound dated by 1997.