~ / initial commit / Kubernetes

Issue #07 — "First commit"

Kubernetes

The first commit of Kubernetes arrived as a 47,501-line dump from Google. It had controllers, kubelets, an apiserver, and no pods. Pods were called 'tasks'.

infrastructurego
the commit
repo  kubernetes/kubernetes
sha  2c4b3a562c
author  Joe Beda
date  2014-06-06
message  "First commit"
stats  250 files · +47,501 lines

Kubernetes runs an enormous fraction of the world's production software. Pods, services, controllers, ingresses: the vocabulary it introduced has become so load-bearing that it is hard to remember that any of it had to be invented.

Here is the commit where the vocabulary was wrong.

250 files, 47,501 lines, a complete cluster orchestrator on day one. Like VS Code, Kubernetes arrived as a dump: a scrubbed export from an internal Google codebase, with the interesting history left behind. Unlike VS Code, the dump contains fossils of an earlier set of ideas that did not survive contact with the public.

1. There are no pods

This is the strangest thing in the first commit of Kubernetes, and I am putting it up front because you should see it before the rest: there are no pods in the initial commit of Kubernetes.

If you look at api/doc/, you find these three files:

task-schema.json
service-schema.json
controller-schema.json

And in api/examples/:

task.json
task-list.json
service.json
service-list.json
controller.json
controller-list.json

In the Kubernetes you use today, "task" is not a concept. The concept is "pod." A pod is a group of one or more containers that share networking and storage, the smallest schedulable unit of a Kubernetes cluster. It is the fundamental abstraction. You cannot understand Kubernetes without understanding pods.

On June 6, 2014, at 23:40 UTC, Kubernetes had no pods. It had tasks.

The word "task" did not survive much longer. Within weeks of this commit, the community started arguing about it. A "task" sounded too much like a one-shot batch job, whereas the Kubernetes concept was a long-running container group. The rename to "pod" happened quickly. But the fossil is still there, in the git log, if you go looking. Run grep -ri task api/ on this commit and you will find the bones of a Kubernetes that never shipped.

2. The other words that were already right

Most of the vocabulary, though, was already in place:

  • cmd/apiserver/apiserver.go: the API server, the central HTTP front door of the cluster, named what it is still named today.
  • cmd/kubelet/kubelet.go: the node agent. Still kubelet.
  • cmd/proxy/proxy.go: the node-local network proxy. Still there.
  • cmd/controller-manager/controller-manager.go: the loop that watches resources and makes the world match the desired state. The name has stuck.
  • cmd/cloudcfg/cloudcfg.go: the command-line client. This one did get renamed. It would eventually become kubectl. "cloudcfg" is a fossil of the period when Kubernetes was a Google-internal project tied to Google Cloud, before anyone had decided it would be portable.

The architecture, in other words, was fully formed. Kubernetes did not discover its shape in public. It arrived with its bones in place, from an internal Google lineage that went back to Borg: the cluster manager that had been running Google's production infrastructure for more than a decade. Kubernetes is Borg, re-implemented in Go, written with the benefit of hindsight and a mandate to be portable.

This commit is the moment that the Borg lineage became visible to everyone else.

3. What the README said

The initial README.md is 128 lines, and it is extraordinarily terse. Here is the project's original self-description:

"Kubernetes is an open source reference implementation of container cluster management."

Not "the future of infrastructure." Not "cloud native." Not "Borg for everyone." A reference implementation. The framing is almost defensive. As if the authors were hedging against the possibility that someone else would build the real thing, and this was just a documented version.

Ten years later, Kubernetes is not a reference implementation. It is the implementation. Nobody has replaced it. The hedged positioning in this README turned out to be absurdly understated.

There are a few lessons here about projecting confidence versus letting the product speak for itself, and I will not belabor them. But I do think it is meaningful that the most important piece of infrastructure software of the 2010s launched with a README that was trying not to oversell.

4. Who Joe Beda is, and why this matters

Joe Beda is one of the three people most often credited with creating Kubernetes, alongside Brendan Burns and Craig McLuckie. All three were at Google at the time. All three had previously worked on Borg or closely adjacent systems. Joe specifically had been at Google since 2003, working on infrastructure tools, and had a keen sense of what the outside world was missing: what you would need to give developers if they did not have the luxury of Google's internal platform.

Joe typed this commit. That is not the same as saying he wrote all 47,501 lines. Much of the code was contributed by the team, and the project had been cooking internally for months before the public push. But the commit is attributed to him, and the gesture is meaningful: a Google engineer who had lived inside Borg deciding that it was time to give the rest of the industry the same tools, except this time they would be open and portable.

He left Google in 2015 and co-founded Heptio, a company dedicated to making Kubernetes easier to adopt. Heptio was acquired by VMware in 2018. Joe kept working on Kubernetes and its ecosystem throughout. His career arc is, in a very real sense, downstream of this commit.

5. The genre of dump-commits, revisited

Like VS Code's "Hello Code," this commit is a dump-commit: a large, pre-baked product released into a fresh public repository. The history is scrubbed. The contributors list is flattened to one name. The development arc that produced this snapshot is invisible.

But Kubernetes' dump-commit is different from VS Code's in an important way. VS Code arrived finished, or at least close to it. The public product was more or less the product as it existed internally. Kubernetes arrived obviously incomplete. The task/pod rename happened almost immediately. The API would go through multiple rewrites over the next year. The command-line tool would be renamed. The project would grow dozens of concepts (deployments, stateful sets, ingress, operators, custom resources) that were not present on day one.

In that sense, Kubernetes is the less-polished but more-alive of the two dumps. It shows up at the party with its jacket still on backward, and it figures out the rest in public, in the open, in the commit log that follows. The taskpod rename is not embarrassing; it is the first visible sign that the project was willing to change its mind. That willingness is, arguably, the reason Kubernetes won.

6. Footnotes from the commit log

  • The project was publicly announced at DockerCon on June 10, 2014: four days after this commit. Joe Beda pushed the repo, then the announcement, then went on stage.
  • "Kubernetes" (κυβερνήτης) is Greek for "helmsman" or "pilot." The project name fits with the Greek-themed Docker ecosystem naming of the era, though Docker itself was never Greek (it is English for a stevedore who loads ships).
  • The cloudcfg command exists in this commit because the tool was originally designed to interact with Google Cloud Platform clusters. Making Kubernetes runnable on AWS, on bare metal, and on a laptop was not yet a priority on day one. It became one very quickly.
  • If you want to see the taskpod rename in action, search the early commit history of kubernetes/kubernetes for task. You can watch the vocabulary migrate in real time.