36

How to reduce your JVM app memory footprint in Docker and Kubernetes

 3 years ago
source link: https://medium.com/wix-engineering/how-to-reduce-your-jvm-app-memory-footprint-in-docker-and-kubernetes-d6e030d21298
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

How to reduce your JVM app memory footprint in Docker and Kubernetes

Image for post
Image for post

Photo by Franck V. on Unsplash

Recently, I managed to dramatically reduce the memory usage of a widely-used JVM app container on Kubernetes and save a lot of money. I figured out which JVM flags are more important, how to set them correctly and how to easily measure the impact of my actions on various parts of the app’s memory usage. Here are my Cliff notes.

The story starts with a use case. I work at Wix, as part of the data-streams team, that is in charge of all our Kafka infrastructure. Recently I was tasked with creating a Kafka client proxy for our Node.js services.

Use Case: Kafka client sidecar with a wasteful memory usage

The idea was to delegate all Kafka-related actions (e.g. produce and consume) from a Node.js app to a separate JVM app. The motivation behind this is that a lot of our own infrastructure for Kafka clients is written in Scala (It’s called greyhound — the open source version can be found here).
With a sidecar, the Scala code doesn’t need to get duplicated in other languages. Only a thin wrapper is needed.

Image for post
Image for post

Once we deployed the sidecar in production, we noticed that it consumes quite a lot of memory.

Image for post
Image for post

Metric used — container_memory_working_set_bytes

As you can see from the table above, the memory footprint for the sidecar (running openjdk 8) alone is 4–5 times bigger than the node-app container when it used to include the kafka library.

I had to understand why and how to reduce it considerably.

Experimenting with production data

I set out to create a test-app that mimics the sidecar of this particular node app in order to be able to freely experiment on it without affecting production. The app contained all the consumers from the production app for the same production topics.

As a way to monitor memory consumption I’ve used metrics such as heapMemoryUsedand nonHeapMemoryUsed exposed from mxbeans inside my application to Prometheus/Grafana but you can also use jconsole or jvisualvm (both come bundled with JDK 8 or above)

First, I tried to understand the impact of each consumer and producer, and of the gRPC client (that calls the node app) and I came to the conclusionthat having one more consumer (or one less) does not affect memory footprint in a meaningful way.

JVM Heap Flags

Then, I turned my attention to heap allocation,
There are two important JVM flags that are related to heap allocation -Xms (heap memory size on startup) and -Xmx (maximum heap memory size)

I’ve played around with many different combinations of the two and recorded the resulting container memory usage:

Image for post
Image for post

Container overall used memory with different heap flags

The first conclusion I came to from analysing the data on variations in heap flags was that if Xmx is higher than Xms, and you have an app with a high memory pressure, then the allocated memory for heap is almost certainly going to continue to grow up to the Xmx limit, causing the container’s overall memory usage to grow as well (see comparison in the charts below).

Image for post
Image for post

Xmx >> Xms

But if Xmx is the same as Xms, you can have much more control over the overall memory usage, as the heap will not gradually increase over time (see comparison below).

Image for post
Image for post

Xmx = Xms

The second conclusion I came to from the heap flags data was, that you can lower Xmx dramatically as long as you don’t see a significant duration of JVM pauses due to Garbage Collection (GC) — more than 500ms for an extended period of time. I’ve again used Grafana for monitoring GC, but you can also use visualgc or gceasy.io

Image for post
Image for post

benign JVM pause times due to GC

Please be careful with the number you set for Xmx — if your application has high variation in the message consuming throughput, your application will be more susceptible to GC storms once your app experiences a big burst of incoming messages.

Kafka related tune-up

Our Greyhound (Kafka) Consumer has an internal message buffer that can get as high as 200 messages. When I reduced the maximum allowed size to 20, I’ve noticed that heap memory usage oscillates on a much narrower band than with size=200 (and also has considerably lower usage overall):

Image for post
Image for post

Heap memory usage pattern when bufferMax=200

Image for post
Image for post

Heap memory usage pattern when bufferMax=20

Of course reducing buffer size means the app will not handle bursts well — so this does not work for high throughput applications. In order to mitigate this I’ve doubled the level of parallelism for greyhound consumer handlers per pod.
I.e., I’ve increased the number of threads that process Kafka messages from 3 to 6. In outlier cases either the app will require more pods, or the max buffer configuration will have to be altered.

Reducing Kafka Consumer fetch.max.bytes from 50M to 5M (to reduce total polled messages size) did not have a noticeable effect on memory footprint. Nor did extracting out the greyhound producer from the sidecar app (It can reside on DaemonSet so it will run on each K8s Node).

Summary — What helped with reducing memory usage

The optimizations i’ve made reduced the container memory usage from 1000M to around 550–600M. Here are the changes that contributed to the lower footprint:

  • Maintain a consistent heap size allocation
    Make -Xms equal to -Xmx
  • Reduce the amount of discarded objects (garbage)
    E.g.
    buffer less Kafka messages
  • A little bit GC goes a Long way
    Continuelowering xmx as long as GC (new Gen+Old Gen) don’t take considerable percentage (0.25% cpu time)

What didn’t help (substantially)

  • Reducing KafkaConsumer’s fetch.max.bytes
  • Removing Kafka producer
  • Switching from gRPC client to Wix’s custom json-RPC client

Future Work

  • Explore if GraalVM native image can help
  • Compare different GC implementations. (I’ve used CMS, but there’s G1)
  • Reduce the number of threads we use when consuming from Kafka by switching to our open-sourced ZIO based version of greyhound.
  • Reduce the allocated memory for each thread (by default each thread is assigned 1MB)

More improvements (and a second blog post) are sure to come.

More information

Docker memory resource limits and a heap of Java — blog post

Memory Footprint of a Java Process-Video from GeekOUT conference


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK