Domain-Oriented Observability

Modern software systems are becoming more distributed—and running on less-reliable infrastructure—thanks to current trends like microservices and cloud. Building observability into our systems has always been necessary, but these trends are making it more critical than ever. At the same time, the DevOps movement means that the folks monitoring production are more likely than ever to have the ability to actually add custom instrumentation code within the running systems rather than having to make do with bolting observability onto the side.

But, how do we add observability to what we care about the most, our business logic, without clogging up our codebase with instrumentation details? And, if this instrumentation is important, how do we test that we've implemented it correctly? In this article, I demonstrate how a philosophy of Domain-Oriented Observability paired with an implementation pattern called Domain Probe can help, by treating business-focused observability as a first-class concept within our codebase.

What to Observe

"Observability" has a broad scope, from low-level technical metrics through to high-level business key performance indicators (KPIs). On the technical end of the spectrum, we can track things like memory and CPU utilization, network and disk I/O, thread counts, and garbage collection (GC) pauses. On the other end of the spectrum, our business/domain metrics might track things like cart abandonment rate, session duration, or payment failure rate.

Because these higher-level metrics are specific to each system, they usually require hand-rolled instrumentation logic. This is in contrast to lower-level technical instrumentation, which is more generic and often is achieved without much modification to a system's codebase beyond perhaps injecting some sort of monitoring agent at boot time.

It's also important to note that higher-level, product-oriented metrics are more valuable because, by definition, they more closely reflect that the system is performing toward its intended business goals.

By adding instrumentation that tracks these valuable metrics we achieve Domain-Oriented Observability .

The Problem with Observability

So, Domain-Oriented Observability is valuable, but it usually requires hand-rolled instrumentation logic. That custom instrumentation lives right alongside the core domain logic of our systems, where clear, maintainable code is vital. Unfortunately, instrumentation code tends to be noisy, and if we're not careful, it can lead to a distracting mess.

Let's see an example of the kind of mess that the introduction of instrumentation code can cause. Here's a hypothetical e-commerce system's (somewhat naive) discount code logic before we add any observability:

class ShoppingCart…

applyDiscountCode(discountCode){

    let discount; 
    try {
      discount = this.discountService.lookupDiscount(discountCode);
    } catch (error) {
      return 0;
    }

    const amountDiscounted = discount.applyToCart(this);
    return amountDiscounted;
  }

I'd say we have some clearly expressed domain logic here. We look up a discount based on a discount code and then apply the discount to the cart. Finally, we return the amount that was discounted. If we failed to find a discount, we do nothing and exit early.

This application of discounts to a cart is a key feature, so good observability is important here. Let's add some instrumentation:

class ShoppingCart…

applyDiscountCode(discountCode){
    this.logger.log(`attempting to apply discount code: ${discountCode}`);

    let discount; 
    try {
      discount = this.discountService.lookupDiscount(discountCode);
    } catch (error) {
      this.logger.error('discount lookup failed',error);
      this.metrics.increment(
        'discount-lookup-failure',
        {code:discountCode});
      return 0;
    }
    this.metrics.increment(
      'discount-lookup-success',
      {code:discountCode});

    const amountDiscounted = discount.applyToCart(this);

    this.logger.log(`Discount applied, of amount: ${amountDiscounted}`);
    this.analytics.track('Discount Code Applied',{
      code:discount.code, 
      discount:discount.amount, 
      amountDiscounted:amountDiscounted
    });

    return amountDiscounted;
  }

In addition to performing the actual business logic of looking up and applying a discount, we are now also calling out to various instrumentation systems. We're logging some diagnostics for developers, we're recording some metrics for the people operating this system in production, and we're also publishing an event into our analytics platform for use by product and marketing folks.

Unfortunately, adding observability has made a mess of our nice, clean domain logic. We now have only 25% of the code in our applyDiscountCode method involved in its stated purpose of looking up and applying a discount. The clean business logic that we started out with hasn't changed and remains clear and concise, but it's lost among the low-level instrumentation code that now takes up the bulk of the method. What's more, we've introduced code duplication and magic strings into the middle of our domain logic.

In short, our instrumentation code is a huge distraction to anyone trying to read this method and see what it actually does .

Cleaning Up the Mess

Let's see if we can clean up this mess by refactoring our implementation. First, let's extract that icky low-level instrumentation logic into separate methods:

…

class ShoppingCart {
    applyDiscountCode(discountCode){
      this._instrumentApplyingDiscountCode(discountCode);
  
      let discount; 
      try {
        discount = this.discountService.lookupDiscount(discountCode);
      } catch (error) {
        this._instrumentDiscountCodeLookupFailed(discountCode,error);
        return 0;
      }
      this._instrumentDiscountCodeLookupSucceeded(discountCode);
  
      const amountDiscounted = discount.applyToCart(this);
      this._instrumentDiscountApplied(discount,amountDiscounted);
      return amountDiscounted;
    }
  
    _instrumentApplyingDiscountCode(discountCode){
      this.logger.log(`attempting to apply discount code: ${discountCode}`);
    }
    _instrumentDiscountCodeLookupFailed(discountCode,error){
      this.logger.error('discount lookup failed',error);
      this.metrics.increment(
        'discount-lookup-failure',
        {code:discountCode});
    }
    _instrumentDiscountCodeLookupSucceeded(discountCode){
      this.metrics.increment(
        'discount-lookup-success',
        {code:discountCode});
    }
    _instrumentDiscountApplied(discount,amountDiscounted){
      this.logger.log(`Discount applied, of amount: ${amountDiscounted}`);
      this.analytics.track('Discount Code Applied',{
        code:discount.code, 
        discount:discount.amount, 
        amountDiscounted:amountDiscounted
      });
    }
  }

This is a good start. We extracted the instrumentation details into focused instrumentation methods, leaving our business logic with a simple method call at each instrumentation point. It's easier to read and understand applyDiscountCode now that the distracting details of the various instrumentation systems have been pushed down into those _instrument... methods.

However, it doesn't seem right that ShoppingCart now has a bunch of private methods that are entirely focused on instrumentation—that's not really ShoppingCart 's responsibility. A cluster of functionality within a class that is unrelated to that class's primary responsibility is often an indication that there's a new class trying to emerge.

Let's follow that hint by gathering up those instrumentation methods and moving them out into their own DiscountInstrumentation class:

class ShoppingCart…

applyDiscountCode(discountCode){
    this.instrumentation.applyingDiscountCode(discountCode);

    let discount; 
    try {
      discount = this.discountService.lookupDiscount(discountCode);
    } catch (error) {
      this.instrumentation.discountCodeLookupFailed(discountCode,error);
      return 0;
    }
    this.instrumentation.discountCodeLookupSucceeded(discountCode);

    const amountDiscounted = discount.applyToCart(this);
    this.instrumention.discountApplied(discount,amountDiscounted);
    return amountDiscounted;
  }

We don't make any changes to the methods; we just move them out to their own class with an appropriate constructor:

class DiscountInstrumentation {
  constructor({logger,metrics,analytics}){
    this.logger = logger;
    this.metrics = metrics;
    this.analytics = analytics;
  }

  applyingDiscountCode(discountCode){
    this.logger.log(`attempting to apply discount code: ${discountCode}`);
  }

  discountCodeLookupFailed(discountCode,error){
    this.logger.error('discount lookup failed',error);
    this.metrics.increment(
      'discount-lookup-failure',
      {code:discountCode});
  }
  
  discountCodeLookupSucceeded(discountCode){
    this.metrics.increment(
      'discount-lookup-success',
      {code:discountCode});
  }

  discountApplied(discount,amountDiscounted){
    this.logger.log(`Discount applied, of amount: ${amountDiscounted}`);
    this.analytics.track('Discount Code Applied',{
      code:discount.code, 
      discount:discount.amount, 
      amountDiscounted:amountDiscounted
    });
  }
}

We now have a nice, clear separation of responsibilities: ShoppingCart is entirely focused on domain concepts like applying discounts, whereas our new DiscountInstrumentation class encapsulates all the details of instrumenting the process of applying a discount.

Domain Probe

A Domain Probe[...] enables us to add observability to domain logic while still talking in the language of the domain

DiscountInstrumentation is an example of a pattern I call Domain Probe . A Domain Probe presents a high-level instrumentation API that is oriented around domain semantics, encapsulating the low-level instrumentation plumbing required to achieve Domain-Oriented Observability. This enables us to add observability to domain logic while still talking in the language of the domain , avoiding the distracting details of the instrumentation technology. In our preceding example, our ShoppingCart implemented observability by reporting Domain Observations—discount codes being applied and discount code lookups failing—to the DiscountInstrumentation probe rather than working directly in the technical domain of writing log entries or tracking analytics events. This might seem a subtle distinction, but keeping domain code focused on the domain pays rich dividends in terms of keeping a codebase readable, maintainable, and extensible.

We're releasing this article in installments. Future installments will look at how domain probes make it easier to test instrumentation logic, how to provide context to the domain probe, and alternative ways of implementing domain-oriented observability.

To find out when we publish the next installment subscribe to the site'sRSS feed, Pete's twitter feed , or Martin's twitter stream

What to Observe

The Problem with Observability

Cleaning Up the Mess

Domain Probe

Recommend

Open-sourcing PyTorch-BigGraph for faster embeddings of extremely large graphs

今日头条进军教育，AI会是关键？

“纯电”车主画像：一线城市追求实用，二线城市追求新奇

晨讯：格力投百亿元与美的抢抢冰洗；增值税减税新政已施行；滴滴APP上线“小桔租车”

移动端、羊毛党:浦发银行 X 京东还款优惠券 1000-5_促销活动

泰国低成本COSPLAY小哥之海贼王

请问我的文如何再更进一步？ - 知乎

怎么看极限挑战第二十四季第五期孙红雷故意去踩朱碧石的裙尾？ - 知乎

GDB 简介

Changing the Language of Firefox Directly From the Browser

About Joyk