Implement Node and Pod Affinity/Anti-Affinity in Kubernetes: A Practical Example

24 Jan 2020 7:00am, by

I introduced the concept of node and pod affinity/anti-affinity in last week’s tutorial. We will explore the idea further through a real-world scenario.


We are going to deploy three microservices — MySQL, Redis, and a Python/Flask web app in a four-node Kubernetes cluster. Since one of the nodes is attached to SSD disk, we want to ensure that the MySQL Pod is scheduled on the same node. Redis is used to cache the database queries to accelerate application performance. But no node will run more than one Pod of Redis. Since Redis is utilized as a cache, it doesn’t make sense to run more than one Pod per node. The next goal is to make sure that the web Pod is placed on the same node as the Redis Pod. This will ensure low latency between the web and the cache layer. Even if we scale the number of replicas of the web Pod, it will never get placed on a node that doesn’t have Redis Pod.

Setting up a GKE Cluster and Adding an SSD Disk

Let’s launch a GKE cluster, add an SSD persistent disk to one of the nodes, and label the node.

This will result in a 4-node GKE cluster.

Let’s create a GCE Persistent Disk and attach it to the first node of the GKE cluster.

We need to mount the disk within the node to make it accessible to the applications.

Once you SSH into the GKE node, run the below commands to mount the disk.

Running lsblk command confirms that the disk is mounted at /mnt/data

Exit the shell and run the below command to label the node as disktype=ssd.

Let’s verify that the node is indeed labeled.

Deploying the Database Pod

Let’s go ahead and deploy a MySQL Pod targeting the above node. Use the below YAML specification to create the database Pod and expose it as a ClusterIP-based Service.

There are a few things to note from the above Pod spec. We first implement node affinity by including the below clause in the spec:

This will ensure that the Pod is scheduled in the node that has the label disktype=ssd. Since we are sure that it always goes to the same node, we leverage hostPath primitive to create the Persistent Volume. The hostPath primitive has a pointer to the mount point of the SSD disk that we attached in the previous step.

Let’s submit the Pod spec to Kubernetes and verify that it is indeed scheduled in the node that matches the label.

It’s evident that the Pod is scheduled in the node that matches the affinity rule.

Deploying the Cache Pod

It’s time to deploy the Redis Pod that acts as the cache layer. We want to make sure that no two Redis Pods run on the same node. For that, we will define an anti-affinity rule.

The below specification creates a Redis Deployment with 3 Pods and exposes them as a ClusterIP.

The below clause ensures that a node runs one and only one Redis Pod.

Submit the Deployment spec and inspect the distribution of the pods.

It’s clear that the Redis Pods have been placed on unique nodes.

Deploying the Web Pod

Finally, we want to place a web Pod on the same node as the Redis Pod.

Submit the Deployment spec to create 3 Pods of the web app and expose them through a Load Balancer.

The container image used in the web app does nothing but accessing the rows in the database only after checking if they are available in the cache.

Let’s list all the Pods along with the Node names that they are scheduled in.

We can see that the node gke-tns-default-pool-b11f5e68-2h4f runs three Pods – MySQL, Redis, and Web. The other two nodes run one Pod each for Redis and Web which are co-located for low latency.

Let’s have some fun with the affinity rules. Remember, we are running 4 nodes in the cluster. One of the node is not running any Pod because of the Kubernetes scheduler obeying the rule of co-locating the Web pod and Redis Pod.

What happens when we scale the number of replicas of the Web Pod? Since the anti-affinity rule of Web Deployment imposes a rule that no two Pods of the Web can run on the same node and each Web Pod has to be paired with a Redis Pod, the scheduler wouldn’t be able to place the pod. The new web Pods will be in the pending state forever. This is despite the fact that there is an available node with no Pods running on it.

Remove the anti-affinity rule of the Web Deployment and try scaling the Replica. Now Kubernetes can schedule the Web Pods on any node that has a Redis Pod. This makes the Deployments less restrictive allowing any number of Web Pods to run on any Node provided it runs a Redis Pod.

From the above output, we see that the node gke-tns-default-pool-b11f5e68-cxvw runs two instances of the Web Pod.

But, one of the nodes is still lying idle due to the pod affinity/anti-affinity rules. If you want to utilize it, scale the Redis Deployment to run a Pod on the idle node and then scale the Web Deployment to place some Pods on it.

Continuing the theme of co-locating database and cache layers on the same node, in the next part of this series, we will explore the sidecar pattern to deploy low-latency microservices on Kubernetes.

Janakiram MSV’s Webinar series, “Machine Intelligence and Modern Infrastructure (MI2)” offers informative and insightful sessions covering cutting-edge technologies. Sign up for the upcoming MI2 webinar at http://mi2.live.

Feature image by 3D Animation Production Company from Pixabay.

A newsletter digest of the week’s most important stories & analyses.