Setting up a Smooth [Prometheus] Operator

Introduction

In a previous post we discussed how, as part of moving our apps to Kubernetes, we needed to get some basic foundations in place first. In that case, we looked at how we imported our Kubernetes logs into BigQuery. Now, it’s time to look at the work we embarked upon to get our metrics and alerts running from the new Kubernetes clusters!

For this piece of work, we wanted to deploy a full Prometheus stack to Kubernetes that would monitor applications that have been migrated to Kubernetes as well as the applications currently running in our (old!) Google Compute Engine instance groups. The aim was to accomplish a couple of goals: first, to have visibility into the apps migrated to Kubernetes while also keeping an eye on those waiting to be migrated. Second, to simplify things (no more Cortex and Cassandra!), plus save money as well, since we could deprecate the old, more resource-heavy Prometheus cluster.

Metrics are an integral part of keeping the service level smooth and top-notch, as music fans all over the world demand. With technical metrics like Requests per Second (RPS), CPU and memory usage, and so forth, we make sure our infrastructure is able to sustain the current user activity without disruption. With higher-level metrics, like emails and push notifications sent, we keep a finger on the pulse of our products’ health.

Alerting rules are the way we define what conditions have to be met for us to consider something is not quite right. Alerting routes ensure the right people are made aware in time — and via the right tool — to implement corrections and avert any user-facing service disruptions.

One of the many cool things you can do with Kubernetes is use a tool like Helm to install third party applications with extreme ease. Packages are called Charts in Helm’s world, and each instance of a given chart running in your cluster is called a release. Additionally, Helm allows us to very easily create our own charts by leveraging its powerful templating feature (based on Go templates), which we’ll discuss later on.

Operator

To get our metrics and alerts kick-started, we installed the Prometheus Operator by ways of the kube-prometheus-stack chart, which sets up Prometheus, Grafana, Alertmanager and several other components in a matter of seconds. You can learn more about the Operator Pattern here, but in essence it’s a way to to automate the deployment, configuration and maintenance of software packages in a Kubernetes cluster.

As previously mentioned in other posts, at Songkick we use Terraform to handle all our infrastructure-as-code needs, so installing the Prometheus Operator was as simple as:

Installing the Prometheus Stack using Helm & Terraform

And mic drop! 🤜 🎤

Ok, not just that, as always, the plot needs a little thickening to keep life somewhat interesting! You do need to follow that up with at least a few set statement blocks to customise the operator to your needs.

One of the main benefits of the Operator Pattern is that it leverages the experiences of many teams and comes with loads of configuration decisions already made for you in the form of default values. Still, you need to customize a few things here and there to settings that make sense to your particular use-case. Another strength of operators in general is that they extend the Kubernetes API by adding a couple of Prometheus-specific Custom Resource Definitions (CRDs to friends and family). CRDs allow you to create Kuberenetes resources that are not part of the standard API.

A good example of one of these is the PrometheusRule CRD. It lets us create Prometheus rule groups as a Kuberentes resource. These rules can be very low-level, like sk-prometheus-error-rates which keeps an eye on the number of errors in a given time period and raises an alert depending on the number and duration of them. An example of a higher-level one can be sk-prometheus-daily-digest which checks the health of daily email sending so we can be made aware if users are not getting notified about upcoming events.

The fact that these rules are created as cluster resources means they can very easily be created, updated, or deleted by standard Kubernetes actions and the Operator picks up any changes and keeps the Prometheus pod up to date with the new configuration.

Here’s how one of them looks:

Prometheus Alert Group as a Kubernetes Custom Resoure

Another custom resource is AlertmanagerConfig. This one gave us a bit of an achy-breaky heart (I did dance to the music video during the writing of this post 🤠). An AlertmanagerConfig allows you to define two very crucial bits of information for Alertmanager: what are the available recipients for incoming alerts (for example Slack, email, or PagerDuty) and which routes should these alerts follow to end up in the hands of one, several, or none of these recipients. An added benefit of deploying your configuration via this CRD is that the operator checks your config against the CRD schema, which is always nice.

Sadly, it gave us two weird scenarios in which either most messages would end up in a predefined null recipient or only alerts coming from the prometheus namespace would be delivered. The good thing is that the operator configuration allowed us to override the use of this CRD and supply our own alerting configuration via a standard Kubernetes Secret. Have you been able to get this particular component working as expected? Please let us know in the comments! (Also click that bell icon - oh hold on…).

Instead of using the AlertmanagerConfig custom resource, we went with the other option that the Prometheus Operator offered: using a standard Kubernetes Secret to submit our routes and recipients. First, we defined a values file with all our route settings and a Helm template that interpolates them:

A snippet of our alertmanager config template

The result of this process generates a Kubernetes Secret called alertmanager-config, and all that’s left to do is tell the operator to use it. This is very easy to configure using Terraform:

Telling Alertmanager to get its config from a Kubernetes Secret

Enter Helm

As mentioned before, Helm makes it deliciously easy to install applications in Kubernetes, but it also makes a for very powerful way — once you wrap your head around Go templating — to create new Charts that add configuration and customisation to existing releases (deployed Charts in Helmspeak).

With this in mind, we restricted the use of Terraform to just installing the Operator and configuring its main settings and secrets, but decided to create a new Helm Chart in a separate repository that would take care of all the Prometheus rules, Alertmanager configuration and such. This new Repo gives developers the power to add new alerting rules or custom metric scrape configurations just by editing easy to read YAML files, which the chart then deploys to the cluster. The Prometheus Operator keeps track of any changes made to these resources and synchronises the final Prometheus/Alertmanager configuration files accordingly.

Prometheus service discovery part 1 — current applications

Once Helm delivered our shiny brand-new Prometheus stack, we began working on telling it where the things that it needs to scrape are.

For our applications that are not yet running in Kubernetes we just pointed Prometheus to our Consul service via a consul_sd_configs configuration.

Helm template to configure Prometheus target discovery using Consul

This will let Prometheus discover all the services that Consul knows about and Prometheus will just start looking for a /metrics endpoint and start scraping those metrics.

An additional step is needed for custom metrics that we have added which have a different endpoint, as well as Rabbitmq services since they serve metrics from /api/metrics instead of /metrics. For these cases, we list all the special Consul jobs in a values file as usual:

YAML description of custom Prometheus targets to scrape

These values are then interpolated into a template like this:

Template to generate custom target discovery from YAML

All the above then comes together on a final template that delivers the completed Prometheus scrape configuration as a Kubernetes Secret:

Helm templates can have multiple layers, like Shrek

Prometheus service discovery part 2 — applications migrated to Kubernetes

Once migrated, our applications are deployed to Kubernetes as different releases of a single Helm chart we built called sk-application. This chart sets up all the Kubernetes resources needed to run the application, like a Deployment, Service, Jobs, and so on. The chart also creates a resource of special interest to us in this post: a PodMonitor.

A PodMonitor is a custom resource that lets you specify a group of selectors to match Pods running an application, then specify which of their containers can be scraped and how often. You can also list a group of Kubernetes labels from those Pods that can be transferred directly as Prometheus labels when scraping.

This is how one of the generated PodMonitors looks like after an application gets deployed:

Look for any pods running charts-service and scrape both the app an haproxy containers for metrics

Adding PrometheusRules

Let’s take a deeper dive into how our new Chart gets a new group of Prometheus rules into the Prometheus stack. Our values file for rules contains a single key called additionalPrometheusRulesMap which is made up of groups of rules under a common key.

For example, let’s say we want to add a group of rules that will keep an eye on Redis and let us know if it’s having a bit of a rainy day:

Sample rule to keep an eye on Redis

In this example, we defined a Redis key with a single group of rules, of which the first one will alert us when the Redis service is not operational on one of the Redis nodes.

Next let’s have a look at the template that our chart will use to create a PrometheusRule for each one our rule groups:

The notoriously mysterious Kubernetes List resource type

Let’s see step by step what’s going on here:

This template creates Kubernetes List definition
Each item on the list is obtained by parsing additionalPrometheusRulesMap from the supplied values file and looping through each subkey. From each subkey the chart will create a PrometheusRule custom resource.
In our example above, we will have a PrometheusRule with a group called redis that holds all the alerts in the group, like our RedisDownOnInstance alert.
As soon as a new PrometheusRule resource is created the Operator picks it up and validates its rules. If all looks good, the new rules are incorporated into the Prometheus configuration.

Prometheus Adapter

The next step is making use of Prometheus metrics to have a smarter way of scaling our application pods up or down. Out of the (proverbial) box, Kubernetes allows you to scale the pods in a deployment based on resource metrics like CPU, but the real benefit comes from setting up your autoscaler — called Horizontal Pod Autoscaler or HPA — with custom metrics.

The Prometheus adapter lets you make specific Prometheus metrics available to the cluster via the Kubernetes custom api. In our case, we wanted all HPAs related to our applications to use a custom Prometheus metric (worker_usage) to determine the number of pods at any given time.

Worker usage is the number of current Haproxy sessions vs the limit of sessions defined in the backend, i.e. the total number of workers. We could set our HPA to watch this metric and aim to keep it under 70%, for example. This will prevent requests from being queued by Haproxy and keep response times low.

The way to achieve this is relatively straightforward. All that’s needed is to create a ConfigMap Kubernetes resource that holds the new metric:

Defining a custom k8s metrics from a Prometheus query

Then all we have to do is tell the Prometheus adapter to use the prometheus-adapter configMap as the source of its rules. This is done via Terraform with:

Once this is applied, our new metric should be available via the custom metrics endpoint:

Getting a raw custom metrics from k8s

In this case, worker usage is for one of the pods is currently at 50%.

Now we can set the final piece of this auto-scaling puzzle, and instruct the HPA template in our custom applications chart, to use that metric to determine when to scale pods:

Each app can have its own utilizationTarget

Wrapping things up

The final bit of configuration to get our new Prometheus stack ready for prime-time usage was securing the Prometheus web UI, Grafana, and Alertmanager behind the Google Identity Aware Proxy. This product makes web applications available to authenticated Gmail users within Songkick without the need for them to be on a VPN. In practice, this means that one of our engineers can get an alert via Slack on their phone, and click through to dive deeper into the Prometheus metric, have a look in a Grafana dashboard and determine the next course of action without having to open their laptop.

For us, the next phase of work is now to start migrating services and frontends in anger with the confidence that any changes in quality of service will be brought to our attention in a timely fashion so our users don’t miss out on any livestreams or upcoming live music events.

Let the good times roll!
(and get an alert when they ain’t so good)

Setting up a Smooth [Prometheus] Operator

Setting up a Smooth [Prometheus] Operator

Recommend

Tips and Insights to Help Spot Misinformation and Fake News [Infographic]

Selenium自动化应该避免的测试场景

OpenStack Train（五）：业务组件装 glances安装

Is it Still Worth Learning React Native in 2022?

The Many Lives And Deaths Of Club Penguin

通过 C++ 操作注册表禁用 Windows Defender

无线电射频能量的收集

The Golden Age of the CFO: Advice for Finance Leaders

拉瓜伊拉

Fearless concurrency at a discount?

About Joyk