29

Plaid.com Cuts Their Deployment Times on Amazon ECS With Custom Process Relaunch...

 4 years ago
source link: https://www.tuicool.com/articles/qQzYJzM
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Plaid's engineering team cut their deployment times on AWS ECS by 95% with a custom wrapper to relaunch their node.js processes without recreating the containers.

Plaid.com - a financial technology company that enables applications to connect with users' bank accounts - has integrations with over 9600 different financial institutions, from which it pulls and processes data that can be analyzed later. Plaid runs over 20 internal services with 50+ code commits per day for their core services. The bank integration service, which runs asnode.js processes in containers running onECS, faced slow deployment startup times which in turn affected overall code ship time. Multiple environments in the pipeline added to the slowdown. The long term plan was to move to Kubernetes. A short term solution was found by writing a custom process wrapper to relaunch the application in the same container, and thus avoid container recreations.

Plaid runs 4000 node.js processes in containers. A profiling exercise by the team exposed some possible areas for optimization in the deployment process, during application startup. ECS health checks - similar to Kubernetes liveness checks - were tweaked but to not much avail. Reducing the number of containers was another option but it needed a re-architecting of the service. Spinning up more instances was not a cost-effective approach. They managed to shave off a few minutes with these approaches. InfoQ got in touch with Evan Limanto , Engineer at Plaid, to learn more about the internals.

The team came up with a hot reloading technique by writing a process wrapper. Internally called Bootloader, the wrapper runs in the containers and launches the actual application as a sub-process. Bootloader also traps and forwards signals, and handles logging output. The application was modified to listen on agRPC endpoint for a message sent from the Jenkins deployment pipeline. Limanto says that "each container advertises its own address on a Redis set with an application level heartbeat." This Redis set is used to keep track of all healthy containers at any given time.

The gRPC message has the commit hash in its payload, so it's possible to perform a rollback by sending an older hash, explains Limanto. The message triggers a download of application code from AWS S3, and the app exits with a special status code. Bootloader traps this code and relaunches the app, thus loading the new code in memory. How does the reload happen across all containers? Limanto explains:

The reload happens in a phased manner according to a simple formula:
Reload the current container if the hash of its address is less than `min(TargetPercentage, MaxUnhealthyPercentage + % of containers on new commit)`. Some background job runs this reloading logic on an interval.

It is possible that a reload can be triggered for a process while it is processing requests. How is this handled? Opinions on this differ. While Plaid keeps track of requests being processed and exits only after they are all done, another view endorses writing the app so that it can recover from abrupt shutdowns.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK