Kubernetes 1.21 Brings a New Memory Manager, More Flexible Scheduling

The open source Kubernetes orchestration software is always evolving. And with each new iteration, new features and improvements arrive to make container management easier and more flexible. With the release of the latest version of Kubernetes, cloud native developers and administrators will find just that. In fact, version 1.21 of Kubernetes, due to be released next week, brings 50 enhancements (up from 43 in 1.20 and 34 in 1.19), so there should be something useful for nearly any container user.
But what are these new enhancements? Let’s dig in and find out.
A New Memory Manager
Your container deployments depend on memory. And those deployments must use memory wisely, otherwise, they could wind up draining your cluster of precious resources and your business of money (remember, on cloud-hosted accounts, you pay for what you use).
The Memory Manager is a new feature in the ecosystem that enables the feature of guaranteed memory allocation for pods in the Guaranteed QoS class. With this feature, you will find two different allocation strategies:
- single-NUMA is intended for high-performance and performance-sensitive applications.
- multi-NUMA overcomes situations that cannot be managed with the single-NUMA strategy (such as when the amount of memory a pod demands exceeds the single-NUMA node capacity).
The Memory Manager initializes a Memory Table collection for each NUMA node (and respective memory types) which results in Memory Map objects. Memory Table and Memory Maps are constructed like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
type MemoryTable struct { TotalMemSize uint64 `json:"total"` SystemReserved uint64 `json:"systemReserved"` Allocatable uint64 `json:"allocatable"` Reserved uint64 `json:"reserved"` Free uint64 `json:"free"` } type NodeState struct { NumberOfAssignments int `json:"numberOfAssignments"` MemoryMap map[v1.ResourceName]*MemoryTable `json:"memoryMap"` Nodes []int `json:"nodes"` } type NodeMap map[int]*NodeState |
Find out more about the new Memory Manager from its original README.md.
A More Flexible Scheduler
One thing the developers of Kubernetes understand is that every workload is not the same. With the release of 1.21 the scheduler will receive two new features:
- Nominated nodes allow cloud native developers to define a preferred node, using the
.status.nominatedNodeName
filed within a Pod. If the scheduler fails to fit an incoming pod into a preferred node, it will attempt to preempt lower-priority pods to make room. - Pod affinity selector allows developers to define node affinity into a deployment. This ability allows you to constrain which nodes pods will be scheduled on.
Pod affinity is defined like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
apiVersion: v1 kind: Deployment … spec: … affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: example-label operator: In values: - label-value namespaces: […] namespacesSelector: |
ReplicateSet Downscaling
For anyone that manages a Kubernetes deployment, you understand that autoscaling is probably one of the most crucial features. The one issue that has plagued Kubernetes autoscaling is downscaling after a load spike passes.
With the release of 1.21, there are now two new downscale strategies, which means you will no longer have to manually check when it comes time to downscale a deployment. Those strategies are:
- Random Pod selection on ReplicaSet downscale — which uses
LogarithmicScaleDown
to semi-randomly select pods (based on logarithmic bucketing of pod timestamps) to downscale. - ReplicaSet deletion cost makes it possible for you to annotate Pods, using
controller.kubernetes.io/pod-deletion-cost=X
(Where X is a number between 0 and 10). Pods with a lower deletion cost value will be removed first.
Indexed Job
With an Indexed Job, the job controller will create a Pod with an associated index (added as an annotation), from – to .spec-completions-1. The job controller will create Pods for the lowest indexes that don’t already have active or succeeded pods. If there’s more than one pod for an index, the controller will remove all but one. Active pods that do not have an index are removed and finished pods that don’t have an index won’t count towards failures or successes (and are not removed).
Network Policy Port Ranges
Before Kubernetes 1.21, you had to write a rule for each network policy. Now, you can write a single network policy that defines an entire range of ports. This means less work and fewer policy files for your deployments. Using the NetworkPolicyEndPort feature gate, you could define a range of ports like so:
1 2 3 4 5 6 |
spec: egress: - ports: - protocol: TCP port: 32000 endPort: 32768 |
Conclusion
And there you have it, a few features coming out with Kubernetes 1.21 that should get you excited about this new release. To find out more of what’s coming (and going) with Kubernetes 1.21, make sure to check out the full changelog.