

Allocatable memory and CPU in Kubernetes Nodes
source link: https://learnk8s.io/allocatable-resources
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Published in May 2020
TL;DR:Not all CPU and memory in your Kubernetes nodes can be used to run Pods.
The infographic below summarises how memory and CPU are allocated in Google Kubernetes Engine (GKE), Elastic Kubernetes Service (EKS) and Azure Kubernetes Service (AKS).

How resources are allocated in cluster nodes
Pods deployed in your Kubernetes cluster consume resources such as memory, CPU and storage.
However, not all resources in a Node can be used to run Pods.
The operating system and the kubelet require memory and CPU too, and you should cater for those extra resources.
If you look closely at a single Node, you can divide the available resources in:
- Resources needed to run the operating system and system daemons such as SSH, systemd, etc.
- Resources necessary to run Kubernetes agents such as the Kubelet, the container runtime, node problem detector , etc.
- Resources available to Pods
- Resources reserved to the eviction threshold
As you can guess, all of those quotas are customisable.
But please notice that reserving 100MB of memory for the operating system doesn't mean that the OS is limited to use only that amount.
It could use more (or less) resources — you're just allocating and estimating memory and CPU usage at best of your abilities.
But how do you decide how to assign resources?
Unfortunately, there isn't a fixed answer as it depends on your cluster.
However, there's consensus in the major managed Kubernetes services Google Kubernetes Engine (GKE) , Azure Kubernetes Service (AKS) , and Elastic Kubernetes Service (EKS) , and it's worth discussing how they partition the available resources.
Google Kubernetes Engine (GKE)
Google Kubernetes Engine (GKE) has a well-defined list of rules to assign memory and CPU to a Node .
For memory resources, GKE reserves the following:
- 255 MiB of memory for machines with less than 1 GB of memory
- 25% of the first 4GB of memory
- 20% of the next 4GB of memory (up to 8GB)
- 10% of the next 8GB of memory (up to 16GB)
- 6% of the next 112GB of memory (up to 128GB)
- 2% of any memory above 128GB
For CPU resources, GKE reserves the following:
- 6% of the first core
- 1% of the next core (up to 2 cores)
- 0.5% of the next 2 cores (up to 4 cores)
- 0.25% of any cores above 4 cores
Let's look at an example.
A virtual machine of type n1-standard-2
has 2 vCPU and 7.5GB of memory.
According to the above rules the CPU reserved is:
Allocatable CPU = 0.06 * 1 (first core) + 0.01 * 1 (second core)
That totals to 70 millicores or 3.5% — a modest amount.
The allocatable memory is more interesting:
Allocatable memory = 0.25 * 4 (first 4GB) + 0.2 * 3.5 (remaining 3.5GB)
The total is 1.7GB of memory reserved to the kubelet.
At this point, you might think that the remaining memory 7.5GB - 1.7GB = 5.8GB
is something that you can use for your Pods.
Not really.
The kubelet reserves an extra 100M of CPU and 100MB of memory for the Operating System and 100MB for the eviction threshold.
The total CPU reserved is 170 millicores (or about 8%).
However, you started with 7.5GB of memory, but you can only use 5.6GB for your Pods.
That's close to ~75% of the overall capacity.
You can be more efficient if you decide to use larger instances.
The instance type n1-standard-96
has 96 vCPU and 360GB of memory.
If you do the maths that amounts to:
- 405 millicores are reserved for Kubelet and operating system
- 14.16GB of memory are reserved to Operating System, kubernetes agent and eviction threshold.
In this extreme case, only 4% of memory is not allocatable.
Elastic Kubernetes Service (EKS)
Let's explore Elastic Kubernetes Service (EKS) allocations.
Unfortunately, Elastic Kubernetes Service (EKS) doesn't offer documentation for allocatable resources. You can rely on their code implementation to extract the values.
EKS reserves the following memory for each Node:
Reserved memory = 255MiB + 11MiB * MAX_POD_PER_INSTANCE
What's MAX_POD_PER_INSTANCE
?
In Amazon Web Service, each instance type has a different upper limit on how many Pods it can run.
For example, an m5.large
instance can only run 29 Pods, but an m5.4xlarge
can run up to 234.
You can view the full list here.
If you were to select an m5.large
, the memory reserved for the kubelet and agents is:
Reserved memory = 255Mi + 11MiB * 29 = 574MiB
For CPU resources, EKS copies the GKE implementation and reserves:
- 6% of the first core
- 1% of the next core (up to 2 cores)
- 0.5% of the next 2 cores (up to 4 cores)
- 0.25% of any cores above 4 cores
Let's have a look at an example.
An m5.large
instance has 2 vCPU and 8GiB of memory:
n1-standard-2
It's interesting to note that the memory allocatable to Pods is almost 90% in this case.
Azure Kubernetes Service
Azure offers a detailed explanation of their resource allocations .
The memory reserved for the Kubelet is:
- 255 MiB of memory for machines with less than 1 GB of memory
- 25% of the first 4GB of memory
- 20% of the next 4GB of memory (up to 8GB)
- 10% of the next 8GB of memory (up to 16GB)
- 6% of the next 112GB of memory (up to 128GB)
- 2% of any memory above 128GB
Notice how the allocation is the same as Google Kubernetes Engine (GKE).
The CPU reserved for the Kubelet follows the following table:
CPU CORES CPU Reserved (in millicores) 1 60 2 100 4 140 8 180 16 260 32 420 64 740The values are slightly higher than their counterparts but still modest.
Overall, CPU and memory reserved for AKS are remarkably similar to Google Kubernetes Engine (GKE).
There's one departure, though.
The hard eviction threshold in Google and Amazon's offering is 100MB, but a staggering 750MiB in AKS.
Let's have a look at a D3 v2 instance that has 8GiB of memory and 2 vCPU.
Only 55% of the available memory is allocatable to Pods, in this scenario.
Summary
You might be tempted to conclude that larger instances are the way to go as you maximise the allocable memory and CPU.
Unfortunately, cost is only one factor when designing your cluster.
If you're running large nodes you should also consider:
- The overhead on the Kubernetes agents that run on the node — such as the container runtime (e.g. Docker), the kubelet, and cAdvisor.
- Your high-availability (HA) strategy. Pods can be deployed to a selected number of Nodes
- Blast radius. If you have only a few nodes, then the impact of a failing node is bigger than if you have many nodes.
- Autoscaling is less cost-effective as the next increment is a (very) large Node.
Smaller nodes aren't a silver bullet either.
So you should architect your cluster for the type of workloads that you run rather than following the most common option.
If you wish to explore the pros and cons of different instance types, you should check out this sister blog post Architecting Kubernetes clusters — choosing a worker node size .
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK