Optimizing 700 CPUs Away With Rust

In Tenable.io, we are heavy users of Datadog custom metrics. Millions of metrics are sent through Dogstatsd, providing deep insights into the complex platform. As the platform grew, we found that a significant number of metrics sent by legacy apps were obsolete. We tried to hunt down these obsoleted metrics in the codebase, but modifying legacy applications was extremely time consuming and risky.

To address this, we deployed a StatsD filter as a Datadog agent sidecar to filter out unnecessary metrics. The filter is a simple UDP datagram forwarder written in Node.js (sample, not actual code). We chose Node.js because in our environment, its network performance outstripped other languages that equalled its speed to production. We were able to implement, test and deploy this change across all of the T.io platform within a week.

statsd-filter-proxy is deployed as a sidecar to datadog-agent, filtering all StatsD traffic prior to DogstatsD.

While this worked for many months, performance issues began to crop up. As the platform continued to scale up, we were sending more and more metrics through the filter. During the first quarter of 2021, we added over 1.4 million new metrics as an effort to improve our observability. The filters needed more CPU resources to keep up with the new metrics. At this scale, even a minor inefficiency can lead to large wastage. Over time, we were consuming over 1000 CPUs and 400GB of memory on these filters. The overhead had become unacceptable.

We analyzed the performance metrics and decided to rewrite the filter in a more efficient language. We chose Rust for its high performance and safety characteristics. (See our other post on Rust evaluations) The source code of the new Rust-based filter is available here.

The Rust-based filter is much more efficient than the original implementation. With the ability to fully manage the heap allocations, Rust’s memory allocation for handling each datagram is kept to a minimum. This means that the Rust-based filter only needs a few MB of memory to operate. As a result, we saw a 75% reduction in CPU usage and a 95% reduction in memory usage in production.

optimizing-700-cpus-away-with-rust-dc7a000dbdb2

Per pod, average CPU reduced from 800m to 200m core

Per pod, average memory reduced from 70MB to 5MB.

In addition to reclaiming compute resources, the latency per packet has also dropped by over 50%. While latency isn’t a key performance indicator for this application, it is rewarding to see that we are running twice as fast for a fraction of the resources.

With this small change, we were able to optimize away over 700 CPU and 300GB of memory. This was all implemented, tested and deployed in a single sprint (two weeks). Once the new filter was deployed, we were able to confirm the resource reduction in Datadog metrics.

CPU / Memory usage dropped drastically following the deployment

Source: https://github.com/tenable/statsd-filter-proxy-rs

TL;DR:

Replaced JS-based StatsD filter with Rust and received huge performance improvement.
At scale, even small optimization can result in a huge impact.

Optimizing 700 CPUs Away With Rust

Optimizing 700 CPUs Away With Rust

Recommend

摸鱼之人永远体会不到的快乐

Tech Tent: Did e-Estonia beat the virus?

一套山寨百度皮肤的抄袭之旅

Increasing our development confidence and productivity with Bors

Irish cyber-attack: Hackers bail out Irish health service for free

Women in Tech: "Diversity, creativity, and innovation go together." -...

Cyber attack: When will the Irish health service get a resolution?

Snapchat Spectacles AR: Augmented reality on your face

Cloud Native Continuous Deployment with GitLab, Helm and LKE

A 20-year-old Xbox Easter egg has been revealed, and there may still be more

About Joyk