Strategies for Kubernetes Pod Placement and Scheduling

17 Jan 2020 9:36am, by

Kubernetes has one of the most sophisticated schedulers that handles the pod placement strategy. Based on the resource requests mentioned in the pod spec, Kubernetes scheduler automatically chooses the most appropriate node to run the pod.

But there are scenarios where we may have to intervene in the scheduling process to enable matchmaking between the pod and a node or two specific pods. Kubernetes offers a powerful mechanism to take control of the pod placement logic.

Let’s explore the key techniques that influence the default scheduling decisions in Kubernetes.

Node Affinity/Anti-Affinity

Since its inception, Kubernetes relied on using labels and selectors to group resources. For example, a service uses the selector to filter the pods with specific labels that would selectively receive the traffic. Labels and selectors use simple equality-based conditions (= and !=) to evaluate the rule. The same technique was extended to nodes through the nodeSelector feature that would force a pod to get scheduled on a specific node.

Eventually, labels and selectors started to support set-based querying that brought advanced filtering techniques based on in, notin, and exists operators. Combined with the equality-based requirements, set-based requirements offer complex techniques to filter resources in Kubernetes.

Node affinity/anti-affinity uses the expressive set-based filtering techniques of labels and annotations to define the placement logic of pods on specific nodes. Annotations offer additional metadata that is not exposed to the selectors which means the keys used for annotations cannot be included in querying and filtering resources. But node affinity makes it possible to use the annotations in the expressions. Anti-affinity ensures that the pod doesn’t get scheduled a node that matches the rule.

Apart from the ability to use complex logic in the queries, node affinity/anti-affinity can impose hard and soft rules for the placement logic. A hard rule enforces a strict policy that may block a pod being placed on nodes that don’t match the criteria. A soft rule will first check if any of the nodes match the specified condition, and if they don’t, it will use the default scheduling mode to place the pod. The expressions, requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution enforce the hard and soft rules respectively.

Below are the examples of using node affinity/anti-affinity with hard and soft rules.

The above rule will instruct Kubernetes scheduler to try and place the pod on a node running in the “asia-south1-a” zone of a GKE cluster. If there are no nodes available, the scheduler is free to apply the standard placement logic.

The above rule imposes anti-affinity by using the NotIn operator. This is a hard rule which ensures that no pod is placed in a GKE node running in the asia-south1-a zone.

Pod Affinity/Anti-Affinity

While node affinity/anti-affinity tackle the matchmaking between pods and nodes, there are scenarios where we need to ensure that pods are co-located or no two pods are running on the same node. Pod affinity/anti-affinity helps us in applying rules that enforce granular placement logic.

Similar to the expressions in node affinity/anti-affinity, pod affinity/anti-affinity can impose and hard and soft rules through requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution. It is also possible to mix and match node affinity with pod affinity to define complex placement logic.

To understand the concept better, imagine we have a web and cache deployments with 3 replicas each running in a three-node cluster. To ensure low latency between web and cache pods, we want to run them on the same node. At the same time, we want to avoid running more than one cache pod on the same node. This would enforce the strategy of running web pods with one and only cache pod per node.

We start by deploying cache with an anti-affinity rule that prevents more than one pod running on a node.

The topologyKey uses the default label attached to a node to dynamically filter on the name of the node. Notice how we are using podAntiAffinity expression along with the in operator to apply the rule.

Assuming that 3 pods of cache are scheduled on separate nodes of the cluster, we now want to deploy the web pods that get placed on the same nodes that have a cache pod. We will use podAffinity to enforce this logic.

The above clause instructs Kubernetes scheduler to find the nodes that have the cache pod and deploy the web pod.

Apart from node and pod affinity/anti-affinity, we can also use taints and tolerations to define custom placement logic. We can also write custom schedulers that take over the scheduling logic from the default scheduler. We will explore them in the future articles of this series.

In the next part of this series, I will walk you through an end-to-end tutorial on using node and pod affinity in production scenarios. Stay tuned!

Janakiram MSV’s Webinar series, “Machine Intelligence and Modern Infrastructure (MI2)” offers informative and insightful sessions covering cutting-edge technologies. Sign up for the upcoming MI2 webinar at http://mi2.live.

The Cloud Native Computing Founder, which manages Kubernetes, is a sponsor of The New Stack.

Feature image by analogicus from Pixabay.

This post is part of a larger story we're telling about Kubernetes.

Notify me when ebook is available

Notify me when ebook is available