📢 Webinar Alert! Future-Proofing IT Operations: How Charter Enhanced Reliability with Squadcast. Register Here! 🌟

Kubernetes Health Check Using Probes

Mar 2, 2022
Last Updated:
Mar 2, 2022
Share this post:
Kubernetes Health Check Using Probes
Table of Contents:

    Introduction

    Kubernetes is an open-source container orchestration platform that significantly simplifies an application's creation and management. Distributed systems like Kubernetes can be hard to manage, as they involve many moving parts and all of them must work for the system to function. Even if a small part breaks, it needs to be detected, routed and fixed. These actions also need to be automated. Kubernetes allows us to do that with the help of readiness and liveness probes. In this blog, we will discuss these probes in detail. But before that, let’s first discuss health checks.

    What is a Health Check?

    Health checks are a simple way to let the system know whether an instance of your app is working. If the instance of your app is not working, the other services should not access it or send requests to it. Instead, requests should be sent to another instance that is ready, or you should retry sending requests.

    The system should be able to bring your app to a healthy state. By default, Kubernetes will start sending traffic to the pod when all the containers inside the pod have started. Kubernetes will restart containers when they crash. This default behavior should be enough to get started. Making deployments more robust becomes relatively straightforward as Kubernetes helps create custom health checks. But before we do that, let's discuss the pod life cycle.

    Pod lifecycle

    A Kubernetes pod follows a defined life cycle. These are the different phases:

    • When the pod is first created, it starts with a pending phase. The pod scheduler tries to figure out where to place the pod. If the scheduler can’t find the node to place the pod, it will remain pending (To check why the pod is in a pending state run ‘kubectl describe pod <pod name>’ command).
    • Once the pod is scheduled, it goes to the container creation phase, where the images required for the application are pulled, and the container starts.
    • Once the containers are in the pod, it moves to the running phase, where it continues until the program is completed successfully or terminated.

    To check the status of the pod, run ‘kubectl get pod’ command and check the STATUS column. As you can see, in this case, all the pods are in a running state. Also, the READY column states the pod is ready to accept user traffic

    <p>CODE: https://gist.github.com/ShubhanjanMedhi-dev/6ad51ce6cd771253b64877f3c25fde8b.js</p>

    Different Types of Probes in Kubernetes

    Kubernetes gives you the following types of health checks or probes:

    • Readiness probes: This probe will tell you when your app is ready to serve traffic. Kubernetes will ensure the readiness probe passes before allowing a service to send traffic to the pod. If the readiness probe fails, Kubernetes will not send the traffic to the pod until it passes.
    • Liveness probes: Liveness probes will let Kubernetes know whether your app is healthy. If your app is healthy, Kubernetes will not interfere with pod functioning, but if it is unhealthy, Kubernetes will destroy the pod and start a new one to replace it.

    To understand this further, let's take an example of a real-world scenario. You have an application that needs some time to warm up or download the application content from some external source like GitHub. Your application shouldn't receive traffic until it's fully ready. By default, Kubernetes will start sending traffic as soon as the process inside the container starts. Using the readiness probe, Kubernetes will wait until the app has fully started before it allows the service to send traffic to the new copy.

    Let's take another scenario where your application crashes due to a bug in code (maybe an edge case), and it hangs indefinitely and stops serving requests. Because your process continues to run by default, Kubernetes will send traffic to the broken pod. Using the liveness probes, Kubernetes will detect the app is no longer serving requests and restart the malfunctioning pod by default.

    With the theory part done, let us see how to define the probes. There are three types of probes:

    • HTTP
    • TCP
    • Command

    Note: You have an option to start by defining either the readiness or liveness probes, as the implementation for both requires a similar template. For example, if we first define livenessProbe, we can use it to define readinessProbe or vice-versa.

    • HTTP probes (httpGet): This is the most common probe type. Even if your app isn’t an HTTP server, you can usually create a lightweight HTTP server inside your app to respond to the liveness probe. Kubernetes will ping a path (for example /healthz) at a given port (8080 in this example). If it gets an HTTP response in the 200 or 300 range, it will be marked as healthy. (For more information regarding HTTP response code, refer to this link). Otherwise, it will be marked as unhealthy. Here is how you can define HTTP livelinessProbe:

    <p>CODE: https://gist.github.com/ShubhanjanMedhi-dev/f7f04af12f29a6522a2a6a104a698f7f.js</p>

    HTTP readiness probe is defined just like the HTTP livelinessProbe, you just have to replace liveness with readiness.

    <p>CODE: https://gist.github.com/ShubhanjanMedhi-dev/7b292738c1d6bca6774254974c438448.js</p>

    • TCP probes (tcpSocket): With TCP probes, Kubernetes will try to establish a TCP connection on the specified port (for example, port 8080 in the below example). If it can establish a connection, the container is considered healthy. If it can't, it's considered a failure. These probes will be handy where HTTP or command probes don't work well. For example, the FTP service will be able to use this type of probe.

    <p>CODE: https://gist.github.com/ShubhanjanMedhi-dev/75715409ccbd4a3f303e4cfdb0ff8f87.js</p>

    • Command probes (exec command): Kubernetes will run a command inside your container in the case of command probes. If the command returns an exit code zero, the container will be marked as healthy. Otherwise, it will be marked as unhealthy. This type of probe is useful when you can’t or don’t want to run an HTTP server, but you can run a command that will check whether your app is healthy. In the example below, we check whether the file /tmp/healthy exists, and if the command returns an exit code zero, the container will be marked as healthy; otherwise, it will be marked as unhealthy.

    <p>CODE: https://gist.github.com/ShubhanjanMedhi-dev/430a0cbd9c721b63037425c1ea9f7b4f.js</p>

    Probes can be configured in many ways based on how often they need to run, the success and failure thresholds, and how long to wait for responses.

    • initialDelaySeconds (default value 0): If you know your application needs n seconds (for example, 30 seconds) to warm up, you can add delays in seconds until the first check is executed by using initialDelaySeconds.
    • periodSeconds (default value 10): If you want to specify how frequently you execute a check, you can define that using periodSeconds.
    • timeoutSeconds (default value 1): This defines the maximum number of seconds until the probe operation is timed out.
    • successThreshold (default value 1): This is the number of attempts until the probe is considered successful after the failure.
    • failureThreshold (default value 3): In case of probe failure, Kubernetes makes multiple attempts before the probe is marked as failed.

    Note: By default, the probe will stop if the application is not ready after three attempts. In case of a liveness probe, it will restart the container. In the case of a readiness probe, it will mark pods as unhealthy.

    For more information about probe configuration, refer to this link.

    Let’s combine everything we have discussed so far. The key thing to note here is the use of readinessProbe with httpGet. The first check will be executed after 10 seconds, and then it will be repeated after every 5 seconds.

    <p>CODE: https://gist.github.com/ShubhanjanMedhi-dev/dc0496b6eece05961e7acc685c7e96cc.js</p>

    • Use the ‘kubectl create’ command to create a pod and specify the ‘yaml manifest’ file with ‘-f’ flag. You can give any name to the file, but it should end with a ‘.yaml’ extension.

    <p>CODE: https://gist.github.com/ShubhanjanMedhi-dev/02bd8d06868bbece1574c42409215fdc.js</p>

    • If you check the pod's status now, it should show the status as Running(under STATUS column), but if you check the READY column, it will still show 0/1, which means it's not ready to accept a new connection.

    <p>CODE: https://gist.github.com/ShubhanjanMedhi-dev/a78a9cf2fe0b6141f9da85e8722bd466.js</p>

    • Verify the status after a few seconds as we set the initial delay of a second. By now the pod should be running.

    <p>CODE: https://gist.github.com/ShubhanjanMedhi-dev/9a2601ff135cc16f0f922de4cb586a9c.js</p>

    • To check the detailed status of all the parameters (for example, initialDelaySeconds, periodSeconds, etc.) used when defining readiness probe, run the ‘kubectl describe’ command.

    <p>CODE: https://gist.github.com/ShubhanjanMedhi-dev/7e08fa28d9ecc3f92a0a9da0c1968b79.js</p>

    Let's further reinforce the concept of liveness and readiness probe with the help of an example. First, let's start with a liveness probe. In the below example, we are executing a command, ‘touch healthy; sleep 20; rm -rf healthy; sleep 600’.

    Above, we have used touch command to create a file named ‘healthy’. This file will exist in the container for the first 20 seconds, then it will be removed by using the ‘rm -rf’ command. Lastly, the container will sleep for 600 seconds.

    Then we defined the liveness probe. It first checks whether the file exists using the ‘cat healthy’ command. It does that with an initial delay of 5 seconds. We further define the parameter 'periodSeconds' which performs a liveness probe every 5 seconds. Once we delete the file, after 20 seconds the probe will be in a failed state.

    <p>CODE: https://gist.github.com/ShubhanjanMedhi-dev/318497d6c38b39832a3164163f71bcb3.js</p>

    • To create a pod, store the above code in a file that ends with ‘.yaml’ (for example, ‘liveness-probe.yaml’) and execute the ‘kubectl create’ command with ‘-f <file name>’, which will create the pod.

    <p>CODE: https://gist.github.com/ShubhanjanMedhi-dev/7c82652b8603695760b2c224148636a4.js</p>

    • Run the ‘kubectl get events’ command, and you will see that the liveness probe has failed, and the container has been killed and restarted.

    <p>CODE: https://gist.github.com/ShubhanjanMedhi-dev/940c96434da3e5da32d5d55b4d626207.js</p>

    • You can also verify it by using the ‘kubectl get pods’ command, and as you can see in the restart column, the container is restarted once.

    <p>CODE: https://gist.github.com/ShubhanjanMedhi-dev/e9a9dea88bcb809a527ae93944cb3312.js</p>

    • Now that you understand how the liveness probe works, let's understand how the readiness probe works by tweaking the above example to define it as a readiness probe. In the example below, we execute a command inside the container (sleep 20; touch healthy; sleep 600), which first sleeps for 20 seconds, creates a file, and finally sleeps for 600 seconds. As the initial delay is set to 15 seconds, the first check is executed with a delay of 15 seconds.

    <p>CODE: https://gist.github.com/ShubhanjanMedhi-dev/474a1292ccae51849759b21a3be6d01e.js</p>

    • To create a pod, store the above code in a file that ends with ‘.yaml’, and execute the ‘kubectl create’ command, which will create the pod.

    <p>CODE: https://gist.github.com/ShubhanjanMedhi-dev/7d2dcbed7578381950db17bfca3e9433.js</p>

    • If you execute the ‘kubectl get events’ here, you can see the probe failed, as the file is not present.

    <p>CODE: https://gist.github.com/ShubhanjanMedhi-dev/74dd0b252b0612f788f78d29db046cb0.js</p>

    • If you check the status of the container initially, it is not in a ready state.

    <p>CODE: https://gist.github.com/ShubhanjanMedhi-dev/deb58616db90aa018752ea1341f05982.js</p>

    • But if you check it after 20 seconds, it should be in the running state.

    <p>CODE: https://gist.github.com/ShubhanjanMedhi-dev/26145b92ef3277b45174eebf288b39e8.js</p>

    Conclusion

    Health checks are required for any distributed system, and Kubernetes is no exception. Using health checks gives your Kubernetes services a solid foundation, better reliability, and higher uptime.

    Plug: Use K8s with Squadcast for Faster Resolution

    Squadcast is an incident management tool that’s purpose-built for site reliability engineering. It allows you to get rid of unwanted alerts, receive relevant notifications and integrate with popular ChatOps tools. You also can work in collaboration using virtual incident war rooms and use automation to eliminate toil.

    squadcast
    What you should do now
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Schedule a demo with Squadcast to learn about the platform, answer your questions, and evaluate if Squadcast is the right fit for you.
    • Curious about how Squadcast can assist you in implementing SRE best practices? Discover the platform's capabilities through our Interactive Demo.
    • Enjoyed the article? Explore further insights on the best SRE practices.
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    • Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    • See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management
    • Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit
    What you should do now?
    Here are 3 ways you can continue your journey to learn more about Unified Incident Management
    Get a walkthrough of our platform through this Interactive Demo and see how it can solve your specific challenges.
    See how Charter Leveraged Squadcast to Drive Client Success With Robust Incident Management.
    Share this blog post with someone you think will find it useful. Share it on Facebook, Twitter, LinkedIn or Reddit.
    Share this post:
    Subscribe to our LinkedIn Newsletter to receive more educational content
    Subscribe now

    Subscribe to our latest updates

    Enter your Email Id
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.
    FAQ
    More from
    Squadcast Community
    Creating an Efficient IT Incident Management Plan: A Guide to Templates and Best Practices
    Creating an Efficient IT Incident Management Plan: A Guide to Templates and Best Practices
    March 22, 2024
    SLOs and Customer Experience: Uniting Engineering Excellence with Customer Satisfaction
    SLOs and Customer Experience: Uniting Engineering Excellence with Customer Satisfaction
    March 21, 2024
    Amplify Your Response Team's Impact: Introducing Squadcast’s Additional Responders
    Amplify Your Response Team's Impact: Introducing Squadcast’s Additional Responders
    March 18, 2024
    Learn how organizations are using Squadcast
    to maintain and improve upon their Reliability metrics
    Learn how organizations are using Squadcast to maintain and improve upon their Reliability metrics
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds...
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability...
    Alexandre Lessard
    System Analyst
    Martin do Santos
    Platform and Architecture Tech Lead
    Sandro Franchi
    CTO
    Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2022 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Mid-Market Asia Pacific Incident Management on G2 Users love Squadcast on G2
    Squadcast awarded as "Best Software" in the IT Management category by G2 🎉 Read full report here.
    What our
    customers
    have to say
    mapgears
    "Mapgears simplified their complex On-call Alerting process with Squadcast.
    Squadcast has helped us aggregate alerts coming in from hundreds of services into one single platform. We no longer have hundreds of...
    Alexandre Lessard
    System Analyst
    bibam
    "Bibam found their best PagerDuty alternative in Squadcast.
    By moving to Squadcast from Pagerduty, we have seen a serious reduction in alert fatigue, allowing us to focus...
    Martin do Santos
    Platform and Architecture Tech Lead
    tanner
    "Squadcast helped Tanner gain system insights and boost team productivity.
    Squadcast has integrated seamlessly into our DevOps and on-call team's workflows. Thanks to their reliability metrics we have...
    Sandro Franchi
    CTO
    Revamp your Incident Response.
    Peak Reliability
    Easier, Faster, More Automated with SRE.
    Incident Response Mobility
    Manage incidents on the go with Squadcast mobile app for Android and iOS devices
    google playapple store
    Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2 Best IT Management Products 2022 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2 Users love Squadcast on G2
    Squadcast is a leader in Incident Management on G2 Squadcast is a leader in Mid-Market IT Service Management (ITSM) Tools on G2 Squadcast is a leader in Americas IT Alerting on G2
    Best IT Management Products 2022 Squadcast is a leader in Europe IT Alerting on G2 Squadcast is a leader in Enterprise Incident Management on G2
    Users love Squadcast on G2
    Copyright © Squadcast Inc. 2017-2024