How Canary Deployments Work in Kubernetes, Istio and Linkerd

This is the first of a two-part series on canary deployments. In this post, we cover the developer pattern and how it is supported in Kubernetes, Linkerd and Istio. Inpart two, we’ll explore the operational pattern, how it is supported in Glasnostic, a comparison of the various implementations and finally the pros and cons of canary deployments.

A canary deployment (or canary release) is a microservices pattern that should be part of every continuous delivery strategy. This pattern helps organizations deploy new releases to production gradually, to a subset of users at first, before making the changes available to all users. In the unfortunate event that things go sideways in the push to prod, canary deployments help minimize the resulting downtime, contain the negative effects to a small number of users, and make it easier to initiate a rollback if necessary. In a nutshell, think of a canary deployment as a phased or incremental rollout.

What are Canary Deployments?

The canary deployment pattern takes its name from the now-defunct practice of coal miners bringing canary birds into the mines with them to alert them when toxic gases reached dangerous levels. As you might imagine, as long as the canary sang, the air was safe to breathe. If the canary died, it meant it was time to evacuate! What does this have to do with software development? Think of the canary as a small set of end users who are exposed to new services or new capabilities before the majority of users are. The advantage of this type of rollout is that If the deployment breaks in unacceptable ways, the release can be rolled back and the adverse effects can be contained to just a small set of users.

Figure 1 : Diagram of a typical canary deployment. Initially, client traffic to a service is routed to the existing production cluster (blue). To test a new version of the service, a canary cluster is deployed and the governing gateway or load balancer is instructed to divert a small amount of traffic to it (green). Often, this traffic is simply a small percentage of all requests for the service in question. At times, though, operators may prefer to only route a specific segment of traffic to the canary cluster, such as requests from a particular set of users or requests from a specific geography. If the canary cluster behaves as intended, the deployment is rolled out in full and the old production cluster is removed. Sometimes, a separate cluster serves as a baseline in canary analysis (hatched blue).

What about Canary Analysis?

A refinement of the canary pattern called canary analysis involves an additional baseline cluster running services of the current production version alongside the canary cluster and with equal amounts of traffic. This eliminates any peculiarities of the production cluster that are due to its long-running nature.

Canary Deployment Pattern Implementations

Kubernetes

Support for canary deployments in Kubernetes is relatively limited. The approach typically taken involves deploying canary instances in the desired proportion alongside production instances and then configuring the load balancer to distribute load across all instances as evenly as possible. Deploying a canary is somewhat easier if the governing load balancer is an ingress controller. Because ingress rules can be based on a request’s host or path, or a combination of both, this offers more criteria for how traffic can be split.

However, in the majority of cases, the only way to adjust the relative traffic volumes between canaries and production versions is to tinker with instance scaling. (See e.g. this post for a complete example of how this can be done.) In other words, if 10% of traffic should be routed to the canary, it will have to be deployed alongside nine instances of its production version. To make matters worse, this linear relationship only holds true for evenly distributed load balancing strategies such as round-robin balancing. Dynamic strategies such as least-connection balancing make specific ratios difficult to maintain.

On the whole, Kubernetes is not particularly well suited to canary deployments . It does not support canary routing based on request source criteria such as geography or demographics. Kubernetes also tends to waste resources when specific canary routing ratios are required.

Linkerd

Because Linkerd is based on Twitter’s Finagle library, Buoyant’s original Linkerd , now commonly referred to as Linkerd 1.x , provides extensive support for more generic, dynamic request routing , which operators can use to implement what Buoyant calls “traffic shifting.” Dynamic request routing is based on sets of routing rules called “delegation tables” ( dtabs for short) that are stored globally in namerd and can be changed at runtime without restarting linkerd proxies. As a service mesh, Linkerd 1.x can apply routing rules to any traffic, north-south or east-west, not just ingress traffic.

Linkerd 1.x support for routing is extensive. When Linkerd 1.x initially accepts a request, it is assigned a logical “destination path.” For instance, a request to http://users-service/lookup might be assigned /svc/users as destination path. This path may then undergo a series of rule-based transformations. For example, a dtab rule

/svc => /env/prod

would rewrite the previous destination path /svc/users to /env/prod/users .

Routing rules can be quite expressive. To implement a canary pattern, for instance, operators could specify a rule like

/svc/users => 99 * /env/prod/users & 1 * /env/prod/users-v2

to divert 1% of traffic to a users-v2 canary. In addition, routing rules may be overridden at a per-request basis via the Linkerd-specific l5d-dtab HTTP header. This allows canaries to be tested by explicitly requesting them.

The more recent Linkerd 2.x is a rewrite of Linkerd in Go and Rust and thus does not include the rich Finagle-based routing capabilities. As a result, canary deployments are not supported out of the box, leaving Linkerd 2.x users to rely on Kubernetes’ limited support for routing. The feature request for routing support in Linkerd 2.x is being tracked here .

Istio

As a service-mesh, Istio supports routing rules to be applied to all services in the mesh, not just to ingress traffic. Similar to Linkerd 1.x, these routing rules allow for a fair amount of control over how traffic is directed. Unlike Kubernetes, canary deployments in Istio can be implemented without requiring a specific number of instances.

Canary deployments in Istio are configured in two steps. First, a destination rule is created to define subsets of the target service based on version labels (). A virtual service rule is then used to specify relative weights between these subsets ().

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: users-destinations
spec:
  host: users.prod.svc.cluster.local
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2

Figure 2 : Istio destination rule defining a “v1” and a “v2” subset of a users.prod.svc.cluster.local service based on the version label of the service’s instances.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: users-route
spec:
  hosts:
  - users.prod.svc.cluster.local
  http:
  - route:
    - destination:
        host: users.prod.svc.cluster.local
        subset: v1
      weight: 95
    - destination:
        host: users.prod.svc.cluster.local
        subset: v2
      weight: 5

Figure 3 : Istio virtual service rule specifying that 95% of traffic to users.prod.svc.cluster.local should be routed to its “v1” subset and 5% to its “v2” subset.

Once these rules are applied (via kubectl apply ), the canary deployment takes immediate effect.

The example given above merely scratches the surface of what Istio’s routing rules can do. For instance, the virtual service definition could include a regular expression match against a user’s cookie to implement source routing rules, among others.

Read part two of this posthere .

What are Canary Deployments?

What about Canary Analysis?

Canary Deployment Pattern Implementations

Kubernetes

Linkerd

Istio

Recommend

Mail merge with the Google Docs API

Vue响应式原理-如何监听Array的变化？

Flutter完整开发实战详解(十四、混合开发打包 Android 篇)

如何设计一个电商平台积分兑换系统？

2年java，蚂蚁一面，卒

部分共享单车比公交还贵共享经济为何昂贵起来？

（8）ASP.NET Core 中的MVC路由（一） - 暗断肠

C#编程语言及.NET 平台快速入门指南 - 江名峰

GitHub - vanhauser-thc/AFLplusplus: afl++ is afl 2.52b with all the patches from...

现在哪里的国外服务器还能用吗？最好香港新加坡之类的

About Joyk