Introducing Kubernetes PinnedDeployments

I’d like to introduce a project that I’ve been working on: the PinnedDeployment Kubernetes CRD.

PinnedDeployments function a lot like Deployments: they’re a way to run some kind of service in Kubernetes. But, there’s a twist: PinnedDeployments actively support 2 concurrent versions of a service.

“Regular” Deployments

To understand why this is special, let’s look at an example Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example
  labels:
    app: example
spec:
  replicas: 5
  selector:
    matchLabels:
      app: example
  template:
    metadata:
      labels:
        app: example
    spec:
      containers:
      - name: webserver
        image: nginx:1.6
        ports:
        - containerPort: 80

This deployment says “run 5 nginx pods, matching this template”. Obviously this example is a little contrived (who runs a plain nginx server?), but it’s an easy demo.

Suppose we want to update our image version. With a Deployment, we update the spec, which kicks off the Deployment update process. For a typical Deployment using the RollingUpdate strategy, the Deployment controller starts creating new pods with the new version, and terminating old pods with the old version. Under the hood, this is done by managing a pair of Kubernetes ReplicaSets.

This happens (pretty quickly!) until all pods are up to date. You can control how many replicas are updated at a time, or hit pause mid-update, but updating Deployments is mostly a “press start and watch” experience. The Deployment will stop if new pods fail to launch, but it won’t catch more delayed crashes, or application-level issues. Potentially “obvious” issues wouldn’t be caught until they’ve already been fully deployed.

You can read about Deployments in more depth in the Kubernetes docs.

PinnedDeployments

Suppose we have that same example, and want to use a PinnedDeployment instead. Here’s what the PinnedDeployment would look like:

# Example PinnedDeployment
apiVersion: rollout.zeitgeistlabs.io/v1alpha1
kind: PinnedDeployment
metadata:
  name: example
spec:
  selector:
    matchLabels:
      app: example
  replicas: 5
  replicasPercentNext: 20
  replicasRoundingStrategy: "Nearest"
  templates:
    previous:
      metadata:
        labels:
          app: example
      spec:
        containers:
          - name: webserver
            image: nginx:1.16
            ports:
              - containerPort: 80
    next:
      metadata:
        labels:
          app: example
      spec:
        containers:
          - name: webserver
            image: nginx:1.17
            ports:
              - containerPort: 80

This PinnedDeployment says that we want 5 nginx pods, and want 20% of them to be the new version.

If we create this, we get 4 “previous” pods (nginx 1.16) and 1 “next” pod (nginx 1.17. To roll out more of the next version, we simply increase the replicasPercentNext value. To roll back (partially or completely), we reduce the replicasPercentNext value.

Suppose that we have fully rolled out our new version, by setting replicasPercentNext to 100. Now, we want to try using a different image: httpd (Apache). Here’s what it would look like:

...
  templates:
    previous:
      ...
      spec:
        containers:
          - name: webserver
            image: nginx:1.17
            ports:
              - containerPort: 80
    next:
      ...
      spec:
        containers:
          - name: webserver
            image: httpd:2.4
            ports:
              - containerPort: 80

Notice the the old next version, having been fully rolled out, is now the previous version. Our new next version features the httpd container image.

Scaling

We could technically achieve the previous features by creating 2 Deployment objects in our deployment tooling, and managing both objects.

However, there is a significant bonus to having a single object: all math regarding the replicas is abstracted.

You can point any tool at a PinnedDeployment (including the Kubernetes Horizontal Pod Autoscaler), and tell the PinnedDeployment how many replicas you want. External tools do not need to be aware of the old:new replicas breakdown.

Observant readers will note that this ties into my recent post on autoscaling imbalances. If you created 2 Deployments, each with their own HPA, you will likely find that they don’t scale evenly.

Mini Demo

How It Works

The internals of the PinnedDeployment controller were also heavily inspired by the upstream Deployment controller.

The controller watches for PinnedDeployment create/update events, and triggers a reconcile function when one occurs.

The reconcile function searches for all ReplicaSets that match the PinnedDeployment’s labels (EG app: example). It compares the previous and next pod specs in the PinnedDeployment definition, and tries to match which ReplicaSet has the previous pod spec, and which ReplicaSet has the next pod spec. All other label-matching ReplicaSets are deleted.

Once the previous and next ReplicaSets have been identified, the controller compares the desired state (including number of replicas for each ReplicaSet) against the actual ReplicaSet. If there is a discrepancy, the ReplicaSet is updated.

If a PinnedDeployment is deleted, its child ReplicaSets will be automatically garbage collected.

I’ll be making a post soon on some of the experience of writing the controller and CRD. In particular, I encountered some hiccups with current CRD shortcomings, and the pod specs. [Edit: post is here.]

Status

PinnedDeployment’s API group is currently v1alpha1, which means… the API is in its infancy. Don’t use it in production yet, but I’d love to know if you’re interested. Once some of the rough edges of the implementation are sorted, it will be upgraded to a beta API.

Don’t expect anything dramatic too soon, but I have some longer term plans of abstractions to build on top of PinnedDeployments.

Avatar
Vallery Lancey
Software/Reliability Engineer

I work on distributed systems and reliability.