Allocatable memory and CPU in Kubernetes Nodes

Published in May 2020

TL;DR:Not all CPU and memory in your Kubernetes nodes can be used to run Pods.

The infographic below summarises how memory and CPU are allocated in Google Kubernetes Engine (GKE), Elastic Kubernetes Service (EKS) and Azure Kubernetes Service (AKS).

How resources are allocated in cluster nodes

Pods deployed in your Kubernetes cluster consume resources such as memory, CPU and storage.

However, not all resources in a Node can be used to run Pods.

The operating system and the kubelet require memory and CPU too, and you should cater for those extra resources.

If you look closely at a single Node, you can divide the available resources in:

Resources needed to run the operating system and system daemons such as SSH, systemd, etc.
Resources necessary to run Kubernetes agents such as the Kubelet, the container runtime, node problem detector , etc.
Resources available to Pods
Resources reserved to the eviction threshold

As you can guess, all of those quotas are customisable.

But please notice that reserving 100MB of memory for the operating system doesn't mean that the OS is limited to use only that amount.

It could use more (or less) resources — you're just allocating and estimating memory and CPU usage at best of your abilities.

But how do you decide how to assign resources?

Unfortunately, there isn't a fixed answer as it depends on your cluster.

However, there's consensus in the major managed Kubernetes services Google Kubernetes Engine (GKE) , Azure Kubernetes Service (AKS) , and Elastic Kubernetes Service (EKS) , and it's worth discussing how they partition the available resources.

Google Kubernetes Engine (GKE)

Google Kubernetes Engine (GKE) has a well-defined list of rules to assign memory and CPU to a Node .

For memory resources, GKE reserves the following:

255 MiB of memory for machines with less than 1 GB of memory
25% of the first 4GB of memory
20% of the next 4GB of memory (up to 8GB)
10% of the next 8GB of memory (up to 16GB)
6% of the next 112GB of memory (up to 128GB)
2% of any memory above 128GB

For CPU resources, GKE reserves the following:

6% of the first core
1% of the next core (up to 2 cores)
0.5% of the next 2 cores (up to 4 cores)
0.25% of any cores above 4 cores

Let's look at an example.

A virtual machine of type n1-standard-2 has 2 vCPU and 7.5GB of memory.

According to the above rules the CPU reserved is:

Allocatable CPU = 0.06 * 1 (first core) + 0.01 * 1 (second core)

That totals to 70 millicores or 3.5% — a modest amount.

The allocatable memory is more interesting:

Allocatable memory = 0.25 * 4 (first 4GB) + 0.2 * 3.5 (remaining 3.5GB)

The total is 1.7GB of memory reserved to the kubelet.

At this point, you might think that the remaining memory 7.5GB - 1.7GB = 5.8GB is something that you can use for your Pods.

Not really.

The kubelet reserves an extra 100M of CPU and 100MB of memory for the Operating System and 100MB for the eviction threshold.

The total CPU reserved is 170 millicores (or about 8%).

However, you started with 7.5GB of memory, but you can only use 5.6GB for your Pods.

That's close to ~75% of the overall capacity.

You can be more efficient if you decide to use larger instances.

The instance type n1-standard-96 has 96 vCPU and 360GB of memory.

If you do the maths that amounts to:

405 millicores are reserved for Kubelet and operating system
14.16GB of memory are reserved to Operating System, kubernetes agent and eviction threshold.

In this extreme case, only 4% of memory is not allocatable.

Elastic Kubernetes Service (EKS)

Let's explore Elastic Kubernetes Service (EKS) allocations.

Unfortunately, Elastic Kubernetes Service (EKS) doesn't offer documentation for allocatable resources. You can rely on their code implementation to extract the values.

EKS reserves the following memory for each Node:

Reserved memory = 255MiB + 11MiB * MAX_POD_PER_INSTANCE

What's MAX_POD_PER_INSTANCE ?

In Amazon Web Service, each instance type has a different upper limit on how many Pods it can run.

For example, an m5.large instance can only run 29 Pods, but an m5.4xlarge can run up to 234.

You can view the full list here.

If you were to select an m5.large , the memory reserved for the kubelet and agents is:

Reserved memory = 255Mi + 11MiB * 29 = 574MiB

For CPU resources, EKS copies the GKE implementation and reserves:

6% of the first core
1% of the next core (up to 2 cores)
0.5% of the next 2 cores (up to 4 cores)
0.25% of any cores above 4 cores

Let's have a look at an example.

An m5.large instance has 2 vCPU and 8GiB of memory:

n1-standard-2

It's interesting to note that the memory allocatable to Pods is almost 90% in this case.

Azure Kubernetes Service

Azure offers a detailed explanation of their resource allocations .

The memory reserved for the Kubelet is:

255 MiB of memory for machines with less than 1 GB of memory
25% of the first 4GB of memory
20% of the next 4GB of memory (up to 8GB)
10% of the next 8GB of memory (up to 16GB)
6% of the next 112GB of memory (up to 128GB)
2% of any memory above 128GB

Notice how the allocation is the same as Google Kubernetes Engine (GKE).

The CPU reserved for the Kubelet follows the following table:

CPU CORES CPU Reserved (in millicores) 1 60 2 100 4 140 8 180 16 260 32 420 64 740

The values are slightly higher than their counterparts but still modest.

Overall, CPU and memory reserved for AKS are remarkably similar to Google Kubernetes Engine (GKE).

There's one departure, though.

The hard eviction threshold in Google and Amazon's offering is 100MB, but a staggering 750MiB in AKS.

Let's have a look at a D3 v2 instance that has 8GiB of memory and 2 vCPU.

Only 55% of the available memory is allocatable to Pods, in this scenario.

Summary

You might be tempted to conclude that larger instances are the way to go as you maximise the allocable memory and CPU.

Unfortunately, cost is only one factor when designing your cluster.

If you're running large nodes you should also consider:

The overhead on the Kubernetes agents that run on the node — such as the container runtime (e.g. Docker), the kubelet, and cAdvisor.
Your high-availability (HA) strategy. Pods can be deployed to a selected number of Nodes
Blast radius. If you have only a few nodes, then the impact of a failing node is bigger than if you have many nodes.
Autoscaling is less cost-effective as the next increment is a (very) large Node.

Smaller nodes aren't a silver bullet either.

So you should architect your cluster for the type of workloads that you run rather than following the most common option.

If you wish to explore the pros and cons of different instance types, you should check out this sister blog post Architecting Kubernetes clusters — choosing a worker node size .

How resources are allocated in cluster nodes

However, not all resources in a Node can be used to run Pods.

Google Kubernetes Engine (GKE)

That totals to 70 millicores or 3.5% — a modest amount.

The total is 1.7GB of memory reserved to the kubelet.

The kubelet reserves an extra 100M of CPU and 100MB of memory for the Operating System and 100MB for the eviction threshold.

That's close to ~75% of the overall capacity.

In this extreme case, only 4% of memory is not allocatable.

Elastic Kubernetes Service (EKS)

In Amazon Web Service, each instance type has a different upper limit on how many Pods it can run.

Azure Kubernetes Service

The hard eviction threshold in Google and Amazon's offering is 100MB, but a staggering 750MiB in AKS.

Only 55% of the available memory is allocatable to Pods, in this scenario.

Summary

Unfortunately, cost is only one factor when designing your cluster.

Recommend

sync

Dotscience Is Shutting Down

Introducing Athenadriver: An Open Source Amazon Athena Database Driver for Go

Get Deeper Look into Looping Concept with Python

Generative chatbots using the seq2seq model!

Tips On Learning Web Development

荣耀 X10 图赏：5G 手机，千元拐点

eBay Integrates Video into the eBay Motors App

JVM Typed Functions - Haxe - The Cross-platform Toolkit

GraalVM 20.1 - graalvm - Medium

About Joyk