Canary deployment of Lambdas using CDK Pipelines

In this post, we shall perform a Canary deployment of our Lambdas. We will be using CDK pipelines for automated deployments.

Canary is a deployment strategy that releases an application incrementally to a subset of users. This is done to limit the blast radius and for an easy rollback in case of a failure.

What we would like to achieve is:

Automate our API deployment using CDK Pipelines
Use CodeDeploy's Deployment Groups to perform a Canary deployment using a Lambda alias. This will perform a weighted routing between the current function and the previous function version.
Create an alarm to check for any errors in the Lambda and rollback if any.
Perform a simple load test to check if everything works.

Here's the repo for those who want to dive right in or follow along with the walkthrough.

ryands17 / lambda-canary-deployments

API Gateway and Lambda with weighted routing to the latest function deployed

Prerequisites

CDK prerequisites like bootstrapping and setting up AWS CLI with the default profile is assumed.
As our repository will be on GitHub, we need to create an access token for CodePipeline to fetch our repository. To perform this, create a Personal Access Token on GitHub with the repo and admin:repo_hook options checked.

Then, you need to create a Secrets Manager secret that will store this token and we will fetch this later in our CDKPipelines construct.

We're done with the prerequisites. Let's move on to creating the API.

API Stack

This stack will contain an API Gateway REST API with a Lambda Proxy integration. We will also add a CodeDeploy Deployment Group that will perform the required traffic shifting from the current to the latest deployed version.

In case our deployment is erroneous for some reason, CodeDeploy should rollback to the current version. For this, we will use CloudWatch Alarms that will check if our Lambda give any errors and if the alarm is in an Alarm state, CodeDeploy will rollback.

Let's start with the Lambda:

// lib/api-stack.ts

const aliasName = 'stage'

const handler = new Lambda(this, 'apiHandler')
const stage = new lambda.Alias(this, 'apiHandlerStage', {
  aliasName,
  version: handler.currentVersion,
})

Enter fullscreen modeExit fullscreen mode

We create a Lambda function named apiHandler and an alias named apiHandlerStage which we will point to the current version. When we deploy a new version, CodeDeploy will perform a weighted routing using the alias that will point both to the current version and the latest deployed version.

Next, we will create the REST API.

// lib/api-stack.ts

const api = new apiGw.LambdaRestApi(this, 'restApi', {
  handler: stage,
  deployOptions: { stageName: 'staging' },
})

Enter fullscreen modeExit fullscreen mode

CDK provides us with a neat construct named LambdaRestApi that automatically routes any request arriving to the Lambda that we specify using Lambda Proxy integration. And here we have specified stage which is actually an alias.

Moving on to the important step, i.e configuring an alarm for rollbacks in case of errors.

// lib/api-stack.ts

const failureAlarm = new cw.Alarm(this, 'lambdaFailure', {
  alarmDescription: 'The latest deployment errors > 0',
  metric: new cw.Metric({
    metricName: 'Errors',
    namespace: 'AWS/Lambda',
    statistic: 'sum',
    dimensionsMap: {
      Resource: `${handler.functionName}:${aliasName}`,
      FunctionName: handler.functionName,
    },
    period: cdk.Duration.minutes(1),
  }),
  threshold: 1,
  evaluationPeriods: 1,
})

Enter fullscreen modeExit fullscreen mode

Let's break this down. First, we create a description for this alarm named lambdaFailure.

We then specify a metric based on which we want the alarm to react to. The metric here is an AWS provided metric named Errors under the AWS/Lambda namespace.

We want the observe the total number of errors so we specify sum as the statistic. The time period over which we want this statistic to apply is specified in period and we set that to be 1 minute.

The dimensions that we need to specify are FunctionName i.e. our Lambda function name and Resource which will be our Lambda alias name in this case. The alias name will always be the functionName:aliasName. We will be watching the Error metric of this function specifically.

We then specify the threshold which in simple terms means that how many errors should occur before the alarm goes in an Alarm state. Even if we encounter 1 error, we would like to trigger the alarm in this case.

Finally, we specify evaluationPeriods which is the number of periods over which the statistic is compared to the threshold. We have set this to 1 because what we want is to trigger the alarm in a period of 1 minute if the Lambda errors 1 or more times.

We created the alarm, now let's use this in our Deployment Group.

// lib/api-stack.ts

new cd.LambdaDeploymentGroup(this, 'canaryDeployment', {
  alias: stage,
  deploymentConfig: cd.LambdaDeploymentConfig.CANARY_10PERCENT_5MINUTES,
  alarms: [failureAlarm],
})

Enter fullscreen modeExit fullscreen mode

We create a CodeDeploy Deployment Group specifying our Lambda alias and a Canary deployment of 10% in 5 minutes.

So for the first 5 minutes, we will be serving 90% of our current Lambda version and 10% of the newly deployed Lambda version. After 5 minutes, the entire traffic will be shifted over to the newly deployed Lambda version and that will become the current version. We also provided the created alarm in alarms. Note that we can specify more than one alarm.

Finally, let's look at our Lambda function:

// functions/apiHandler.ts

import { ProxyHandler } from 'aws-lambda'

export const handler: ProxyHandler = async (event) => {
  return {
    body: JSON.stringify({
      message: 'API version 1 has been deployed!',
      path: event.path,
    }),
    headers: { 'Content-Type': 'application/json' },
    statusCode: 200,
  }
}

Enter fullscreen modeExit fullscreen mode

This is a simple Lambda function that returns a 200 with a message. Now let's look at creating a Stage for our pipeline that will deploy our API.

Stage Stack

We need to define an application stage for our pipeline. A pipeline can have multiple stages like dev, staging, and production. In this case, we will define a staging stage.

// lib/stages.ts

import * as cdk from '@aws-cdk/core'
import { ApiStack } from './api-stack'

export class StagingStage extends cdk.Stage {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StageProps) {
    super(scope, id, props)

    new ApiStack(this, 'ApiStackStaging')
  }
}

Enter fullscreen modeExit fullscreen mode

We create a new stage named StagingStage and create our an instance of the ApiStack here. This stage will be bootstrapping our API and Lambda function and we will use this stage in our pipeline.

CDK Pipelines

Let's start by creating our CDK Pipeline that will contain values for our repo, artifacts, and our synth step.

// lib/pipeline-stack.ts

const sourceArtifact = new codepipeline.Artifact()
const cloudAssemblyArtifact = new codepipeline.Artifact()

const pipeline = new pipelines.CdkPipeline(this, 'deployApi', {
  cloudAssemblyArtifact,
  sourceAction: new codepipelineActions.GitHubSourceAction({
    actionName: 'GH',
    output: sourceArtifact,
    oauthToken: cdk.SecretValue.secretsManager('github-token'),
    owner: 'ryands17',
    repo: 'lambda-canary-deployments',
    branch: 'main',
  }),
  synthAction: pipelines.SimpleSynthAction.standardYarnSynth({
    cloudAssemblyArtifact,
    sourceArtifact,
  }),
})

Enter fullscreen modeExit fullscreen mode

Let's break this down:

First we have our artifacts that will be stored in S3.
Then we specify a GitHubSourceAction with the above created sourceArtifact, oAuthToken that we created as a prerequisite, repo, owner, and branch that CodePipeline will pull from.
Finally, we specify a synth action and here CDK automatically provides us with a standardYarnSynth that installs the dependencies and runs the synth command to create the corresponding CloudFormation template. If you're using NPM, you need to use standardNpmSynth.

Moving on, let's add the Staging stage to this pipeline.

// lib/pipeline-stack.ts

const stagingStage = new StagingStage(this, 'staging', {
  env: { region: process.env.region || 'us-east-2' },
})

pipeline.addApplicationStage(stagingStage)

Enter fullscreen modeExit fullscreen mode

We create an instance of our StagingStage and add it to our pipeline using the addApplicationStage method. This will deploy our REST API (ApiStack) that we created in the StagingStage.

Deploying the app

We're done with the constructs. Now let's deploy the app using yarn cdk deploy.

Note: If you're using your own repository for this instead of mine, then you need to first push this code to your repo and then run yarn cdk deploy otherwise it won't find your repository.

After deploying, we can see the pipeline being run for the first time.

After this is completed, head over to CloudFormation and fetch the API Gateway URL from the Outputs section of your stack.

On opening this, we see the message we sent from our Lambda successfully.

Let's change the message in our Lambda to API version 2. On performing a commit and push, we can see that CodePipeline automatically fetches the source and continues with the pipeline.

On checking our Lambda function, we can see that the alias is performing a weighted routing to our current version and the newly deployed one. If you try the API URL in your browser, you will see both messages, API version 1 and API version 2 on refreshing multiple times.

Here version 1 is our current version (API version 1) and version 2 is our newly deployed version (API version 2).

We can see that our Deployment Group shifted the traffic successfully after 5 minutes as there were no errors.

Finally, let's simulate an error by adding an explicit error to the function in the hopes of triggering our CloudWatch alarm:

// functions/apiHandler.ts
import { ProxyHandler } from 'aws-lambda'

export const handler: ProxyHandler = async (event) => {
  if (Math.random() > 0.5) throw Error('an unexpected error occured!')

  return {
    body: JSON.stringify({
      message: 'API version 2 has been deployed!',
      path: event.path,
    }),
    headers: { 'Content-Type': 'application/json' },
    statusCode: 200,
  }
}

Enter fullscreen modeExit fullscreen mode

On pushing this code, we can see that the pipeline is triggered, and now we shall load test our API using a tool called artillery.

artillery quick -c 30 -n 100 -d 10 $API_URL

Enter fullscreen modeExit fullscreen mode

As you can see, a lot of 502 responses from the API. Let's check on our alarm now.

Voila! The alarm is triggered due to Lambda erroring out. On checking CodePipeline, we can see that the deployment failed and our original API version 2 is back. Let's run artillery again to see if our API works.

And we get all 200! Let's fix the nasty error and commit updating the message to API version 3. This will again run the pipeline and the message API version 3 will be displayed after a successful deployment.

When not to Canary

I had a discussion with Sheen Brisals about a point where Canary deployments are not recommended and that is when you're updating Lambda permissions.

In this case, we don't want to have a state where there's a permission mismatch and errors due to this will always trigger the alarm and rollback.

In this case, it would be better to replace Canary with All at Once in your Deployment Group as follows:

new cd.LambdaDeploymentGroup(this, 'canaryDeployment', {
  alias: stage,
  // deploymentConfig: cd.LambdaDeploymentConfig.CANARY_10PERCENT_5MINUTES,

  deploymentConfig: cd.LambdaDeploymentConfig.ALL_AT_ONCE,
  alarms: [failureAlarm],
})

Enter fullscreen modeExit fullscreen mode

So whenever there's a configuration change i.e. change in IAM permissions, you perform an ALL_AT_ONCE deployment and switch to Canary for the next deployment.

Conclusion

Here's the repo again for those who haven't checked it out yet.

ryands17 / lambda-canary-deployments

API Gateway and Lambda with weighted routing to the latest function deployed

Also don't forget to destroy the stack using yarn cdk destroy and also delete the StagingStack from the CloudFormation console to not incur extra charges.

And we're done! Thanks for reading this and I would love to hear your thoughts on this in the comments! If you liked this post, do give it a like and share, and follow me on Twitter. Until next time!

In this post, we shall perform a Canary deployment of our Lambdas. We will be using CDK pipelines for automated deployments.

ryands17 / lambda-canary-deployments

API Gateway and Lambda with weighted routing to the latest function deployed

Prerequisites

API Stack

Stage Stack

CDK Pipelines

Deploying the app

When not to Canary

Conclusion

ryands17 / lambda-canary-deployments

API Gateway and Lambda with weighted routing to the latest function deployed

Recommend

Machine Learning vs. Deep Learning: What's the difference?

How to create fully responsive product card using pure HTML, CSS.

Dynamically Importing Components with React.lazy

Hackers Could Increase Medication Doses Through Infusion Pump Flaws

Scaffold django apis like a champion

This Barnacle-Inspired Glue Seals Bleeding Organs in Seconds

Can Robots Evolve Into Machines of Loving Grace?

溫度壓制表現強悍，ASUS ROG Strix GeForce RTX 3080 Ti 實測

Setting Up a Custom Domain for iCloud Email

My Python testing style guide

About Joyk