Smart healthchecks with Kubernetes and Spring Boot Actuator
source link: https://arnoldgalovics.com/smart-healthchecks-with-kubernetes-and-spring-boot-actuator/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Smart healthchecks with Kubernetes and Spring Boot Actuator
I’ve seen quite some projects in the past using various orchestration tools for deploying applications. Probably the most popular one nowadays is Kubernetes (K8S). Even though these tools have such a vast amount of functionality to help applications to run in a scalable and resilient manner, I keep noticing engineers are not utilizing the features they have.
One example I often see is missing or misused healthchecks. The world would be a great place if the orchestration tool could just figure out whether the application is healthy and take the necessary actions if it’s not. Fortunately we are writing 2020 when such tools are already available (they were available long before :)).
Today I’m going to focus on Kubernetes and show you how to set up proper healthchecks to monitor a Spring Boot application that has Actuator set up.
Healthcheck in Kubernetes
Let’s begin with a little bit of introduction into the healthcheck mechanism of Kubernetes.
The probe actions
All the healthchecks are managed by so called “probes” in the K8S ecosystem. Imagine the probe as a process that periodically does something to determine the health of the application. There are 3 actions a probe can do.
Executing a command
Just very briefly covering this. You can execute a command or list of commands. If the return value of the expression is 0, the application is considered healthy. If it’s other than 0, it’s unhealthy and needs action.
Opening a TCP socket
With this type of probe, Kubernetes will attempt to open a TCP socket on a specified port. If the socket is created successfully, the container is considered healthy. In any other case if the socket creation failed, the state is unhealthy.
Executing an HTTP GET
This one is the most sophisticated one. The system will execute an HTTP GET request against a specific endpoint. If the API is returning a status code between 200 – 399, it is considered healthy. If it is any other status code or the request could not be executed, the container is unhealthy. You can also provide some custom headers that needs to be passed with the healtcheck request in case you have a special case.
Liveness probe
There are 3 different types of probes Kubernetes is providing. Each one is suitable for a different use-case.
- Liveness probe
- Readiness probe
- Startup probe
In this article, I’m going to cover only the first one – liveness probe.
The purpose of this type is to detect when an application gets into a state it cannot recover from. Imagine a container running for days/weeks and suddenly it stops serving requests. The only way to resolve the problem is to restart it. Of course I know there must be an underlying issue in the application that needs resolution but for now let’s not go into that direction.
As soon as the liveness probe detects the application is not passing the healthcheck, it will initiate a container restart on the pod. Note, in this case the pod itself will not be restarted but the underlying container that is unhealthy.
There are 5 configuration parameters for a probe:
- initialDelaySeconds
- The number of seconds to wait until the probe is initiated after the container start. Useful if you know your app is taking at least 10 seconds to start then simply set this to 10 so the liveness probe won’t count the startup as failure.
- periodSeconds
- Defines how often the probe performs the healthcheck, in seconds.
- timeoutSeconds
- Determines after how much time the probe times out, in seconds. If you think about executing an HTTP GET request, if the response is not received (the application is slow) in for example 1 second (if that’s the configured timeout). The probe is considering it as a failure.
- successThreshold
- The minimum number of consecutive healthcheck successes before the container is considered healthy after being unhealthy.
- failureThreshold
- The maximum number of consecutive healthcheck failures before the container is considered unhealthy and being restarted.
An example TCP socket based liveness probe configuration looks the following, just to give you a feel:
Spring Boot Actuator health API
Alright, we’ve covered the basic idea of the probes in Kubernetes, let’s look at Spring Boot Actuator.
Spring Boot Actuator is an extension module for Spring Boot to monitor and manage the application through JMX or HTTP. The marketing slogan is: enhancing the application with production-ready features. There are lots and lots of features available in the module, I’m not going to cover all of those in the article but if you are interested, you can find more info here.
There is one interesting feature though, the health API. There is a single HTTP endpoint you can call /actuator/health
. The default behavior is simple, if the application is healthy, it responds with HTTP 200 and the following JSON:
If the application is unhealthy, it will respond with HTTP 503 and the following JSON:
Customizing the health indicator
I don’t want to go into deep details how Actuator works under the hood but there is an interface called HealthIndicator
that contributes to the overall system health. There are more than a dozen of them already auto-configured for you. An example is DiskSpaceHealthIndicator
that checks for low disk space.
Writing a custom HealthIndicator
is quite easy. Simply create a new class that implements the HealthIndicator
interface and mark it as a @Component
. Spring will pick it up automatically.
For the sake of the testing, I’ll show you a very simple HealthIndicator
that can be switched UP
/DOWN
with a simple HTTP call.
I’m starting off from a generated project on start.spring.io. Gradle one with Actuator and Web dependencies. So, as a first step, let’s create a Spring bean that will hold the health state (healthy/unhealthy):
Nothing special, just a state holder class with a single boolean value that represents the health of the system.
The HealthIndicator
is also very simple:
There is a single method on the HealthIndicator
interface that needs to be implemented HealthIndicator#health
. You can do very complicated things like introducing new health states to the system but we’ll go with the existing UP
and DOWN
states. In this particular example, deciding the health is based on the ManualHealthHolder
bean. If it says healthy, the state will be UP
. If it says unhealthy, the state will be DOWN
.
The next and last step is to create an HTTP endpoint for changing the state.
Very minimal again. There is a single HTTP GET mapping for switching the statuses: /status
.
Testing time, starting up the application with ./gradlew clean build bootRun
.
If you cURL localhost:8080/actuator/health
, you’ll get the UP
response (I’ve used jq here to format the response nicely).
To simulate the downtime of the application, we can call the localhost:8080/switch
API. It will switch the healthy flag internally and now querying the /actuator/health
endpoint, you’ll get the DOWN
state.
Liveness probe with Actuator
Now that we know the building blocks, let’s go on with integrating Actuator and Kubernetes together. Kubernetes is working with Docker containers so we need to create a container from the Spring application. A simple Dockerfile
looks the following:
Alright, now after executing
we can also execute the
command to build the docker image.
I’m using minikube here for testing so let me add a few more steps to properly create the image so we are able to deploy it to the actual Kubernetes cluster.
Before creating the image, you should execute the following command to change your docker context to the Kubernetes cluster.
Now that you have the docker context set up, execute the command to build the docker image. You can verify with the docker images
command whether the image was successfully created.
The next point we’re looking at next is the deployment to the cluster. The initial deployment file looks the following:
It’s a very basic deployment descriptor, one of the important points is to set the imagePullPolicy
to IfNotPresent
or Never
so Kubernetes will not try to download the docker image.
Adding the liveness probe:
So the trick here is to use the httpGet
action on the probe and bind it to the /actuator/health
endpoint. As I said earlier, in case of the httpGet
action, the probe will consider the application healthy when the status code is between 200
and 399
. Guess what, the /actuator/health
API is fulfilling that contract. In case the application is reporting a healthy state, it will respond with 200
and 503
when it’s down.
The rest of the configuration is just telling Kubernetes to wait 5 seconds before the probe is initiated. Also, each 10 seconds execute the GET request against the endpoint to check for the health. And consider the container unhealthy if 2 consecutive healthchecks have failed.
Putting it all together
That’s it. Testing time. Now if you’ve read it this far, I assume the docker image is already build so we’re going from there.
The only thing we need to do is to deploy the application. With a little kubectl command you can do it:
The output should be:
So now if we take a look on the pods we have:
Everything looks good. The next phase of the testing is to flip the healthy flag in the application so we can see that the liveness probe is controlling to have the container in a healthy state.
Accessing the switch API for testing
There are 2 options for this. One is to open a terminal inside the container so we can locally trigger the /switch
endpoint. The other one is to proxy the pod traffic to the local machine.
To get a shell inside the container, execute the following command (of course change the pod name to yours):
From then on, executing
will switch the flag. If you query the /actuator/health
API the same way, it’s going to say DOWN
.
The other option to trigger the /switch
API is to forward requests from your local machine directly to the pod with the use of kubectl
. However it needs some preparation so the pod is accessible. We need to expose the pod’s port as a Service
. To do that, let’s extend the descriptor we created:
So the full k8s-deployment.yaml
file looks the following:
Now that everything is in place, redeploy the stack with
To access the API, only a port-forward is needed:
The command binds the local 9876 port to the pod’s 8080 port. So from now on, you can access the API from your local machine through localhost:9876
.
Observing the liveness probe
The application is deployed. We can access the API. Everything is ready to see the liveness probe in action. First of all, let’s verify that the pod is alive and the /actuator/health
API is returning the UP
status.
Looks good so far. Switching the health with /switch
.
From the pod perspective, everything looks good however Actuator is saying the service is DOWN
. Observing the pod events will clearly indicate that there was in fact a container restart because of it.
You can see the message Liveness probe failed: HTTP probe failed with statuscode: 503
. And it happened 2 times so the container was considered unhealthy and have been restarted.
When the container restart is done, you can query the Actuator health and it will respond with UP
status as the container has been restarted.
Conclusion
I hope you see how easy it is to set up a proper healthcheck – at least liveness – with Kubernetes and Spring Boot. It’s definitely something I recommend doing to create a more resilient system and react on problems automatically.
The code can be found on GitHub. If you liked the article, give it a thumbs up and share it. If you are interested in more, make sure you follow me on Twitter.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK