Kubernetes Health Checks Using Probes

Kubernetes is an open source container orchestration platform that significantly simplifies an application’s creation and management.
Distributed systems like Kubernetes can be hard to manage, as they involve many moving parts and all of them must work for the system to function. Even if a small part breaks, it needs to be detected, routed and fixed.
These actions also need to be automated. Kubernetes allows us to do that with the help of readiness and liveness probes. In this blog, we will discuss these probes in detail. But before that, let’s first discuss health checks.
What Is a Health Check?
Health checks are a simple way to let the system know whether an instance of your app is working. If the instance of your app is not working, the other services should not access it or send requests to it. Instead, requests should be sent to another instance that is ready or you should retry sending requests.
The system should be able to bring your app to a healthy state. By default, Kubernetes will start sending traffic to the pod when all the containers inside the pod have started. Kubernetes will restart containers when they crash. This default behavior should be enough to get started. Making deployments more robust becomes relatively straightforward as Kubernetes helps create custom health checks. But before we do that, let’s discuss the pod life cycle.
Pod Life Cycle
A Kubernetes pod follows a defined life cycle. These are the different phases:
- When the pod is first created, it starts with a pending phase. The scheduler tries to figure out where to place the pod. If the scheduler can’t find the node to place the pod, it will remain pending. (To check why the pod is in pending state, run the
kubectl describe pod <pod name>
command). - Once the pod is scheduled, it goes to the container creating phase, where the images required for the application are pulled, and the container starts.
- Once the containers are in the pod, it moves to the running phase, where it continues until the program is completed successfully or terminated.
To check the status of the pod, run the kubectl get pod
command and check the STATUS
column. As you can see, in this case all the pods are in running state. Also, the READY
column states the pod is ready to accept user traffic.
1 2 3 4 5 6 7 8 9 |
# kubectl get pod NAME READY STATUS RESTARTS AGE my-nginx-6b74b79f57-fldq6 1/1 Running 0 20s my-nginx-6b74b79f57-n67wp 1/1 Running 0 20s my-nginx-6b74b79f57-r6pcq 1/1 Running 0 20s |
Different Types of Probes in Kubernetes
Kubernetes gives you the following types of health checks:
- Readiness probes: This probe will tell you when your app is ready to serve traffic. Kubernetes will ensure the readiness probe passes before allowing a service to send traffic to the pod. If the readiness probe fails, Kubernetes will not send the traffic to the pod until it passes.
- Liveness probes: Liveness probes will let Kubernetes know whether your app is healthy. If your app is healthy, Kubernetes will not interfere with pod functioning, but if it is unhealthy, Kubernetes will destroy the pod and start a new one to replace it.
To understand this further, let’s use a real-world scenario as an example. You have an application that needs some time to warm up or download the application content from some external source like GitHub. Your application shouldn’t receive traffic until it’s fully ready. By default, Kubernetes will start sending traffic as soon as the process inside the container starts. Using the readiness probe, Kubernetes will wait until the app has fully started before it allows the service to send traffic to the new copy.
Let’s take another scenario where your application crashes due to a bug in code (maybe an edge case), and it hangs indefinitely and stops serving requests. Because your process continues to run by default, Kubernetes will send traffic to the broken pod. Using the liveness probes, Kubernetes will detect the app is no longer serving requests and restart the malfunctioning pod by default.
With the theory part done, let us see how to define the probes. There are three types of probes:
- HTTP
- TCP
- Command
Note: You have an option to start by defining either the readiness or liveness probes, as the implementation for both requires a similar template. For example, if we first define livenessProbe, we can use it to define readinessProbe or vice-versa.
- HTTP probes (
httpGet
): This is the most common probe type. Even if your app isn’t an HTTP server, you can usually create a lightweight HTTP server inside your app to respond to the liveness probe. Kubernetes will ping a path (for example,/healthz
) at a given port (8080 in this example). If it gets an HTTP response in the 200 or 300 range, it will be marked as healthy. (For more information regarding HTTP response codes, refer to this link). Otherwise, it will be marked as unhealthy. Here is how you can define HTTP livelinessProbe:
livenessProbe:
httpGet:
path: /healthz
port: 8080
HTTP readiness probe is defined just like the HTTP livelinessProbe; you just have to replace liveness with readiness.
readinessProbe:
httpGet:
path: /healthz
port: 8080
- TCP probes (
tcpSocket
): With TCP probes, Kubernetes will try to establish a TCP connection on the specified port (for example, port 8080 in the below example). If it can establish a connection, the container is considered healthy. If it can’t, it’s considered a failure. These probes will be handy where HTTP or command probes don’t work well. For example, the FTP service will be able to use this type of probe.
readinessProbe:
tcpSocket:
port: 8080
- Command probes (exec command): In the case of commandprobes, Kubernetes will run a command inside your container. If the command returns an exit code zero, the container will be marked as healthy. Otherwise, it will be marked as unhealthy. This type of probe is useful when you can’t or don’t want to run an HTTP server, but you can run a command that will check whether your app is healthy. In the example below, we check whether the file
/tmp/healthy
exists, and if the command returns an exit code zero, the container will be marked as healthy; otherwise, it will be marked as unhealthy.
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
Probes can be configured in many ways based on how often they need to run, the success and failure thresholds, and how long to wait for responses.
- initialDelaySeconds (default value 0): If you know your application needs n seconds (for example, 30 seconds) to warm up, you can add delay in seconds until the first check is executed by using
initialDelaySeconds
. - periodSeconds (default value 10): If you want to specify how often you execute a check, you can define that using
periodSeconds
. - timeoutSeconds (default value 1): This defines the maximum number of seconds until the probe operation is timed out.
- successThreshold (default value 1): This is the number of attempts until the probe is considered successful after the failure.
- failureThreshold (default value 3): In case of probe failure, Kubernetes makes multiple attempts before the probe is marked as failed.
Note: By default, the probe will stop if the application is not ready after three attempts. In case of a liveness probe, it will restart the container. In the case of a readiness probe, it will mark pods as unhealthy.
For more information about probe configuration, refer to this link.
Let’s combine everything we have discussed so far. The key thing to note here is the use of readinessProbe with httpGet
. The first check will be executed after 10 seconds, and then it will be repeated after every 5 seconds.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
apiVersion: v1 kind: Pod metadata: labels: run: nginx name: nginx spec: containers: - image: nginx name: nginx readinessProbe: httpGet: path: / port: 80 initialDelaySeconds: 10 periodSeconds: 5 |
- To create a pod, use the
kubectl create
command and specify the YAML manifest file with-f
flag. You can give any name to the file, but it should end with a.yaml
extension.
1 2 3 |
kubectl create -f readinessprobe.yaml pod/nginx created |
- If you check the pod’s status now, it should show the status as Running under the
STATUS
column. But if you check theREADY
column, it will still show 0/1, which means it’s not ready to accept a new connection.
1 2 3 4 5 |
kubectl get pod NAME READY STATUS RESTARTS AGE nginx 0/1 Running 0 16s |
- Verify the status after a few seconds as we set the initial delay of a second. By now, the pod should be running.
1 2 3 4 5 |
kubectl get pod NAME READY STATUS RESTARTS AGE nginx 1/1 Running 0 28s |
- To check the detailed status of all the parameters (for example, initialDelaySeconds, periodSeconds, etc.) used when defining readiness probe, run the
kubectl describe
command.
1 2 3 |
kubectl describe pod nginx |grep -i readiness Readiness: http-get http://:80/ delay=10s timeout=1s period=5s #success=1 #failure=3 |
Let’s further reinforce the concept of liveness and readiness probe with the help of an example. First, let’s start with a liveness probe. In the below example, we are executing a command, ‘touch healthy; sleep 20; rm -rf healthy; sleep 600
’.
With this command, we have created a filename “healthy” using touch
command. This file will exist in the container for the first 20 seconds, then it will be removed by using the rm -rf
command. Lastly, the container will sleep for 600 seconds.
Then we defined the liveness probe. It first checks whether the file exists using the cat healthy
command. It does that with an initial delay of five seconds. We further define the parameter periodSeconds
which performs a liveness probe every five seconds. Once we delete the file, after 20 seconds the probe will be in a failed state.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
apiVersion: v1 kind: Pod metadata: labels: name: liveness-probe-exec spec: containers: - name: liveness-probe image: busybox args: - /bin/sh - -c - touch healthy; sleep 20; rm -rf healthy; sleep 600 livenessProbe: exec: command: - cat - healthy initialDelaySeconds: 5 periodSeconds: 5 |
- To create a pod, store the above code in a file that ends with
.yaml
(for example,liveness-probe.yaml
) and execute thekubectl create
command with-f <file name>
, which will create the pod.
1 2 3 |
# kubectl create -f liveness-probe.yaml pod/liveness-probe-exec created |
- Run the
kubectl get events
command, and you will see that the liveness probe has failed, and the container has been killed and restarted.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
54s Normal Scheduled pod/liveness-probe-exec Successfully assigned default/liveness-probe-exec to controlplane 53s Normal Pulling pod/liveness-probe-exec Pulling image "busybox" 52s Normal Pulled pod/liveness-probe-exec Successfully pulled image "busybox" in 384.330188ms 52s Normal Created pod/liveness-probe-exec Created container liveness-probe 52s Normal Started pod/liveness-probe-exec Started container liveness-probe 18s Warning Unhealthy pod/liveness-probe-exec Liveness probe failed: cat: can't open 'healthy': No such file or directory 18s Normal Killing pod/liveness-probe-exec Container liveness-probe failed liveness probe, will be restarted |
- You can also verify it by using the
kubectl get pods
command, and as you can see in the restart column, the container is restarted once.
1 2 3 4 5 |
# kubectl get pods NAME READY STATUS RESTARTS AGE liveness-probe-exec 1/1 Running 1 24s |
- Now that you understand how the liveness probe works, let’s understand how the readiness probe works by tweaking the above example to define it as a readiness probe. In the example below, we execute a command inside the container (sleep 20; touch healthy; sleep 600), which first sleeps for 20 seconds, creates a file and finally sleeps for 600 seconds. As the initial delay is set to 15 seconds, the first check is executed with a delay of 15 seconds.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
apiVersion: v1 kind: Pod metadata: labels: name: readiness-probe-exec spec: containers: - name: readiness-probe image: busybox args: - /bin/sh - -c - sleep 20;touch healthy;sleep 600 readinessProbe: exec: command: - cat - healthy initialDelaySeconds: 15 periodSeconds: 5 |
- To create a pod, store the above code in a file that ends with
.yaml
, and execute thekubectl create
command, which will create the pod.
1 2 3 |
# kubectl create -f readiness-probe.yaml pod/readiness-probe-exec created |
- If you execute
kubectl get events
here, you will see that the probe failed as the file is not present.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
63s Normal Scheduled pod/readiness-probe-exec Successfully assigned default/readiness-probe-exec to controlplane 62s Normal Pulling pod/readiness-probe-exec Pulling image "busybox" 62s Normal Pulled pod/readiness-probe-exec Successfully pulled image "busybox" in 156.57701ms 61s Normal Created pod/readiness-probe-exec Created container readiness-probe 61s Normal Started pod/readiness-probe-exec Started container readiness-probe 42s Warning Unhealthy pod/readiness-probe-exec Readiness probe failed: cat: can't open 'healthy': No such file or directory If you check the status of the container initially, it is not in a ready state. # kubectl get pods NAME READY STATUS RESTARTS AGE readiness-probe-exec 0/1 Running 0 5s |
- But if you check it after 20 seconds, it should be in the running state.
1 2 3 4 5 |
# kubectl get pods NAME READY STATUS RESTARTS AGE readiness-probe-exec 1/1 Running 0 27s |
Conclusion
Health checks are required for any distributed system, and Kubernetes is no exception. Using health checks gives your Kubernetes services a solid foundation, better reliability and higher uptime.
Plug: Use K8s With Squadcast for Faster Resolution
Squadcast is an incident management tool that’s purpose-built for site reliability engineering. It allows you to get rid of unwanted alerts, receive relevant notifications and integrate with popular ChatOps tools. You also can work in collaboration using virtual incident war rooms and use automation to eliminate toil.