Plaid.com Cuts Their Deployment Times on Amazon ECS With Custom Process Relaunch...

Plaid's engineering team cut their deployment times on AWS ECS by 95% with a custom wrapper to relaunch their node.js processes without recreating the containers.

Plaid.com - a financial technology company that enables applications to connect with users' bank accounts - has integrations with over 9600 different financial institutions, from which it pulls and processes data that can be analyzed later. Plaid runs over 20 internal services with 50+ code commits per day for their core services. The bank integration service, which runs asnode.js processes in containers running onECS, faced slow deployment startup times which in turn affected overall code ship time. Multiple environments in the pipeline added to the slowdown. The long term plan was to move to Kubernetes. A short term solution was found by writing a custom process wrapper to relaunch the application in the same container, and thus avoid container recreations.

Plaid runs 4000 node.js processes in containers. A profiling exercise by the team exposed some possible areas for optimization in the deployment process, during application startup. ECS health checks - similar to Kubernetes liveness checks - were tweaked but to not much avail. Reducing the number of containers was another option but it needed a re-architecting of the service. Spinning up more instances was not a cost-effective approach. They managed to shave off a few minutes with these approaches. InfoQ got in touch with Evan Limanto , Engineer at Plaid, to learn more about the internals.

The team came up with a hot reloading technique by writing a process wrapper. Internally called Bootloader, the wrapper runs in the containers and launches the actual application as a sub-process. Bootloader also traps and forwards signals, and handles logging output. The application was modified to listen on agRPC endpoint for a message sent from the Jenkins deployment pipeline. Limanto says that "each container advertises its own address on a Redis set with an application level heartbeat." This Redis set is used to keep track of all healthy containers at any given time.

The gRPC message has the commit hash in its payload, so it's possible to perform a rollback by sending an older hash, explains Limanto. The message triggers a download of application code from AWS S3, and the app exits with a special status code. Bootloader traps this code and relaunches the app, thus loading the new code in memory. How does the reload happen across all containers? Limanto explains:

The reload happens in a phased manner according to a simple formula:
Reload the current container if the hash of its address is less than `min(TargetPercentage, MaxUnhealthyPercentage + % of containers on new commit)`. Some background job runs this reloading logic on an interval.

It is possible that a reload can be triggered for a process while it is processing requests. How is this handled? Opinions on this differ. While Plaid keeps track of requests being processed and exits only after they are all done, another view endorses writing the app so that it can recover from abrupt shutdowns.

Recommend

Statistical Modeling — The Full Pragmatic Guide

Erudipedia-UK Tech-Media: New Google Maps Update For iOS Allows Users To Report...

一个冯提莫救不了斗鱼

‎Finally: Countdowns on the App Store

一两个人做开发，是推荐 JSP 还是 HTML+JQ（AJAX）？

爱乐乐享多家门店突然关闭创始人称：每个月都要亏上百万，会解决上课问题

OxygenOS Android 10 Open Beta 1 for the OnePlus 6 and OnePlus 6T - OnePlus Commu...

Artem Russakovskii on Twitter: "Even better, Soli also works when you&#...

Samsung discontinuing 'Linux on DeX' w/ Android 10 update - 9t...

GitHub - middyjs/middy: 🛵 The stylish Node.js middleware engine for AWS Lambda

About Joyk