Kubernetes is an open-source container orchestration platform that significantly simplifies an application's creation and management. Distributed systems like Kubernetes can be hard to manage, as they involve many moving parts and all of them must work for the system to function. Even if a small part breaks, it needs to be detected, routed and fixed. These actions also need to be automated. Kubernetes allows us to do that with the help of readiness and liveness probes. In this blog, we will discuss these probes in detail. But before that, let’s first discuss health checks.
Health checks are a simple way to let the system know whether an instance of your app is working. If the instance of your app is not working, the other services should not access it or send requests to it. Instead, requests should be sent to another instance that is ready, or you should retry sending requests.
The system should be able to bring your app to a healthy state. By default, Kubernetes will start sending traffic to the pod when all the containers inside the pod have started. Kubernetes will restart containers when they crash. This default behavior should be enough to get started. Making deployments more robust becomes relatively straightforward as Kubernetes helps create custom health checks. But before we do that, let's discuss the pod life cycle.
A Kubernetes pod follows a defined life cycle. These are the different phases:
To check the status of the pod, run ‘kubectl get pod’ command and check the STATUS column. As you can see, in this case, all the pods are in a running state. Also, the READY column states the pod is ready to accept user traffic
To understand this further, let's take an example of a real-world scenario. You have an application that needs some time to warm up or download the application content from some external source like GitHub. Your application shouldn't receive traffic until it's fully ready. By default, Kubernetes will start sending traffic as soon as the process inside the container starts. Using the readiness probe, Kubernetes will wait until the app has fully started before it allows the service to send traffic to the new copy.
Let's take another scenario where your application crashes due to a bug in code (maybe an edge case), and it hangs indefinitely and stops serving requests. Because your process continues to run by default, Kubernetes will send traffic to the broken pod. Using the liveness probes, Kubernetes will detect the app is no longer serving requests and restart the malfunctioning pod by default.
With the theory part done, let us see how to define the probes. There are three types of probes:
Note: You have an option to start by defining either the readiness or liveness probes, as the implementation for both requires a similar template. For example, if we first define livenessProbe, we can use it to define readinessProbe or vice-versa.
HTTP readiness probe is defined just like the HTTP livelinessProbe, you just have to replace liveness with readiness.
Probes can be configured in many ways based on how often they need to run, the success and failure thresholds, and how long to wait for responses.
Note: By default, the probe will stop if the application is not ready after three attempts. In case of a liveness probe, it will restart the container. In the case of a readiness probe, it will mark pods as unhealthy.
For more information about probe configuration, refer to this link.
Let’s combine everything we have discussed so far. The key thing to note here is the use of readinessProbe with httpGet. The first check will be executed after 10 seconds, and then it will be repeated after every 5 seconds.
Let's further reinforce the concept of liveness and readiness probe with the help of an example. First, let's start with a liveness probe. In the below example, we are executing a command, ‘touch healthy; sleep 20; rm -rf healthy; sleep 600’.
Above, we have used touch command to create a file named ‘healthy’. This file will exist in the container for the first 20 seconds, then it will be removed by using the ‘rm -rf’ command. Lastly, the container will sleep for 600 seconds.
Then we defined the liveness probe. It first checks whether the file exists using the ‘cat healthy’ command. It does that with an initial delay of 5 seconds. We further define the parameter 'periodSeconds' which performs a liveness probe every 5 seconds. Once we delete the file, after 20 seconds the probe will be in a failed state.
Health checks are required for any distributed system, and Kubernetes is no exception. Using health checks gives your Kubernetes services a solid foundation, better reliability, and higher uptime.
Squadcast is an incident management tool that’s purpose-built for site reliability engineering. It allows you to get rid of unwanted alerts, receive relevant notifications and integrate with popular ChatOps tools. You also can work in collaboration using virtual incident war rooms and use automation to eliminate toil.