Does Kubernetes support green-blue deployment?

Question

I would like to ask on the mechanism for stopping the pods in kubernetes.

I read https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods before ask the question.

Supposably we have a application with gracefully shutdown support (for example we use simple http server on Go https://play.golang.org/p/5tmkPPMiSSt).

Server has two endpoints:

/fast, always send 200 http status code.
/slow, wait 10 seconds and send 200 http status code.

There is deployment/service resource with that configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app/name: test
  template:
    metadata:
      labels:
        app/name: test
    spec:
      terminationGracePeriodSeconds: 120
      containers:
        - name: service
          image: host.org/images/grace:v0.1
          livenessProbe:
            httpGet:
              path: /health
              port: 10002
            failureThreshold: 1
            initialDelaySeconds: 1
          readinessProbe:
            httpGet:
              path: /health
              port: 10002
            failureThreshold: 1
            initialDelaySeconds: 1
---
apiVersion: v1
kind: Service
metadata:
  name: test
spec:
  type: NodePort
  ports:
    - name: http
      port: 10002
      targetPort: 10002
  selector:
    app/name: test

To make sure the pods deleted gracefully I conducted two test options.

First option (slow endpoint) flow:

Create deployment with replicas value equal 1.
Wait for pod readness.
Send request on /slow endpoint (curl http://ip-of-some-node:nodePort/slow) and delete pod (simultaneously, with 1 second out of sync).

Expected:

Pod must not end before http server completed my request.

Got:

Yes, http server process in 10 seconds and return response for me. (if we pass --grace-period=1 option to kubectl, then curl will write - curl: (52) Empty reply from server)

Everything works as expected.

Second option (fast endpoint) flow:

Create deployment with replicas value equal 10.
Wait for pods readness.
Start wrk with "Connection: close" header.
Randomly delete one or two pods (kubectl delete pod/xxx).

Expected:

No socket errors.

Got:

$ wrk -d 2m --header "Connection: Close" http://ip-of-some-node:nodePort/fast
Running 2m test @ http://ip-of-some-node:nodePort/fast
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   122.35ms  177.30ms   1.98s    91.33%
    Req/Sec    66.98     33.93   160.00     65.83%
  15890 requests in 2.00m, 1.83MB read
  Socket errors: connect 0, read 15, write 0, timeout 0
Requests/sec:    132.34
Transfer/sec:     15.64KB

15 socket errors on read, that is, some pods were disconnected from the service before all requests were processed (maybe).

The problem appears when a new deployment version is applied, scale down and rollout undo.

Questions:

What's reason of that behavior?
How to fix it?

Kubernetes version: v1.16.2

Edit 1.

The number of errors changes each time, but remains in the range of 10-20, when removing 2-5 pods in two minutes.

P.S. If we will not delete a pod, we don't got errors.

Hello! Try (1) in slow mode increase the time of reply to 20, 30, 60 secs and make the test again. (2) Regarding the second situation - it looks like that it is normal and desired behavior. All changes when you kill pods or change their configuration are eventual. I mean that when they occur you may expect any output. If you want rock solid durability you shall on client side use retry mechanisms. Also why did you not use Ingress object? — George Gaál
For 60 secs works well, kubectl don't return me control under terminal before request isn't completed (curl host/slow). Second situation, I don't understand, if we know that there will be no new connections (on the pods to which the SIGTERM signal was sent), and kubernetis does not disconnect the pods from the service until they are completed, or until the timeout expires (this is confirmed by the first situation). So what is the cause of these errors on the socket? That is, how do these two situations differ? Both there and there is a just http request. — Rokker Ruslan
P.S. I don't use Ingress resource because I don't need it. And I don't understand how this can help? After all, I don't delete the Service resource. — Rokker Ruslan

Crou Crou · Accepted Answer · 2019-11-22T12:01:28

Does Kubernetes support green-blue deployment?

Yes, it does. You can read about it on Zero-downtime Deployment in Kubernetes with Jenkins,

A blue/green deployment is a change management strategy for releasing software code. Blue/green deployments, which may also be referred to as A/B deployments require two identical hardware environments that are configured exactly the same way. While one environment is active and serving end users, the other environment remains idle.

Container technology offers a stand-alone environment to run the desired service, which makes it super easy to create identical environments as required in the blue/green deployment. The loosely coupled Services - ReplicaSets, and the label/selector-based service routing in Kubernetes make it easy to switch between different backend environments.

I would also recommend reading Kubernetes Infrastructure Blue/Green deployments.

Here is a repository with examples from codefresh.io about blue green deployment.

This repository holds a bash script that allows you to perform blue/green deployments on a Kubernetes cluster. See also the respective blog post

Prerequisites

As a convention the script expects

The name of your deployment to be $APP_NAME-$VERSION

Your deployment should have a label that shows it version

Your service should point to the deployment by using a version selector, pointing to the corresponding label in the deployment

Notice that the new color deployment created by the script will follow the same conventions. This way each subsequent pipeline you run will work in the same manner.

You can see examples of the tags with the sample application:

service

deployment

You might be also interested in Canary deployment:

Another deployment strategy is using Canaries (a.k.a. incremental rollouts). With canaries, the new version of the application is gradually deployed to the Kubernetes cluster while getting a very small amount of live traffic (i.e. a subset of live users are connecting to the new version while the rest are still using the previous version). ...

The small subset of live traffic to the new version acts as an early warning for potential problems that might be present in the new code. As our confidence increases, more canaries are created and more users are now connecting to the updated version. In the end, all live traffic goes to canaries, and thus the canary version becomes the new “production version”.

EDIT

Questions:

What's reason of that behavior?

When new deployment is being applied old pods are being removed and new ones are being scheduled. This is being done by Control Plan

For example, when you use the Kubernetes API to create a Deployment, you provide a new desired state for the system. The Kubernetes Control Plane records that object creation, and carries out your instructions by starting the required applications and scheduling them to cluster nodes–thus making the cluster’s actual state match the desired state.

You have only setup a readinessProbe, which tells your service if it should send traffic to the pod or not. This is not a good solution as like you can see in your example if you have 10 pods and remove one or two there is a gap and you receive socket error.

How to fix it?

You have to understand this is not broken so it doesn't need a fix.

This might be mitigated by implementing a check in your application to make sure it's sending request to working address or utilize other features like load balancing like ingress.

Also when you are updating deployment you can do checks before deleting the pod to check if it does have any traffic incoming/outgoing and roll the update to only not used pods.

Does Kubernetes support green-blue deployment?

First option (slow endpoint) flow:

Second option (fast endpoint) flow:

Edit 1.

1 Answers

Prerequisites