1

Cloud Data: Observability Is the Forgotten Data

 1 year ago
source link: https://dzone.com/articles/cloud-data-observability-is-the-forgotten-data
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Cloud Data: Observability Is the Forgotten Data

In this continuation of the cloud data series, discuss the forgotten data that is often overlooked when planning for cloud-native architectural solutions.

This article is a continuation of a series of posts to better understand how the pitfalls around the collection, maintenance, and storage of your cloud data can mean the difference between failure and success within your cloud strategy.  The concepts in this series stem from brainstorming with my good friend Roel Hodzelmans and are additionally inspired by reactions from the audience to a talk given previously in Dublin, Ireland.

The initial post provided an introduction to cloud and data, and what that means in a cloud-native architecture beyond just storage. In this second article, we discuss the forgotten data that is often overlooked when planning for cloud-native architectural solutions.

Observability Is the Forgotten Data

When you look at observability, you might be thinking about data generated from logs, traces, metrics, and even events across your landscape. What you probably do not realize is that many of your applications and platforms have standard installation settings that generate large amounts of observability data by default. If you are not accounting for all that data being generated when you are heading into the cloud, you are going to have a hard time meeting your budget constraints for deploying and running your production solutions.

Martin Mao stated earlier this year that the growth of observability data is out of control. He talks about how organizations don't mind paying for that data if it led to better outcomes, such as happier customers, higher availability, faster remediation, or more revenue.

"Paying more for logging/metrics/tracing doesn't equate to a positive user experience. Consider how much data can be generated and shipped. $$$. You still need good people to turn data into action. It's remarkable how common this situation is, where an organization is paying more for their observability data (typically metrics, logs, traces, and sometimes events), then they do for their production infrastructure." - Martin Mao

Let's take a look at a simple experiment presented in an article on the hidden cost of data observability, where a simple "Hello, World!" application was deployed on a four-node Kubernetes cluster on GKE (see the article for details of the setup). Scripts were used to simulate load on the application and 30 days of observability data were collected in the following categories:
  • Tracing - One trace per second over 30 days totaled 2.5M traces for a total data size of 161GB.
  • End user metrics - Each back-end call generated a user interaction, so over 30 days, that's 2.5M EUM traces for a total data size of 1GB.
  • Logs - Mileage may vary depending on the configuration of your logging, but here, it was a 30-day total data size of 3.4GB.
  • Metrics - Collected using Prometheus configured for a 10-second sample rate across the cluster for a 30-day total data size of 285GB

Granted, this might not be a perfect example for your research, but it is simple and gives easy-to-follow results of just over 450GB of data for a single, simple application.

If you take into account that the average retention period for audits and compliance is at 13 months, you have to ask yourself how much data you are having to collect, transport, and store effectively across your cloud architecture(s). In modern cloud-native architectures, you can be deploying multiple times a day, where a container is sometimes only around for a few minutes or hours. The default of storing the observability data generated there may not need to be 13 months. Maybe trying setting retention periods for each data type can help with your generated data volume.

Also, consider the various environments that are set up and torn down weekly, or bi-weekly, such as test or lab environments. These certainly don't need extensive observability data retention, if any at all.

As Martin noted, paying for more data is one thing, but people are the core of any successful use case:

"Paying more for logging/metrics/tracing doesn't equate to a positive user experience. Consider how much data can be generated and shipped. $$$. You still need good people to turn data into action."

Who Owns These Decisions?

While realizing that there is a lot of unexpected cloud data coming out of your architecture, there remains an issue of who owns these decisions in your organization. The observability data explosion can cause a lot of issues and costs, but the question to answer is:

Do you dare to flip the switch on a new data collection in your architecture?

The following article in this series will take a look at what the industry is going to be doing in the near future to ensure there is a financial owner for their organization.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK