1

Improving the Vert.x Micrometer Metrics performance

 1 month ago
source link: https://vertx.io/blog/micrometer-metrics-performance/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Improving the Vert.x Micrometer Metrics performance

Vert.x 5 is com­ing with many great new fea­tures and en­hance­ments.

In this ar­ti­cle, you’ll learn about the Vert.x Mi­crom­e­ter Met­rics per­for­mance im­prove­ments.

Introduction

With Vert.x 4, a few users re­ported is­sues re­gard­ing the im­pact of mon­i­tor­ing with Vert.x Mi­crom­e­ter Met­rics. In prac­tice, they ob­served the prob­lem by mea­sur­ing the through­put of their ap­pli­ca­tion. With mon­i­tor­ing en­abled, the through­put de­creased, some­times to a sig­nif­i­cant de­gree.

Such prob­lems can’t be an­a­lyzed and ad­dressed with­out:

  • a good re­pro­ducer, and
  • a proper bench­mark­ing en­vi­ron­ment.

For­tu­nately, we have both. The Vert.x team has been work­ing for a long time on per­for­mance im­prove­ments using the TechEmpower Web Frame­work Bench­marks.

Cu­ri­ous about Vert.x and TechEmpower Frame­work Bench­marks?

Check­out the record­ing of Real world HTTP per­for­mance bench­mark­ing, lessons learned.

Benchmarks

Description

For the analy­sis, we chose two bench­marks:

The plain­text bench­mark con­sists in re­ply­ing to an HTTP re­quest with a very small, fixed-​size text body.

Sam­ple plain­text re­quest:

GET /plaintext HTTP/1.1
Host: server
User-Agent: Mozilla/5.0 (X11; Linux x86_64) Gecko/20130501 Firefox/30.0 AppleWebKit/600.00 Chrome/30.0.0000.0 Trident/10.0 Safari/600.00
Cookie: uid=12345678901234567890; __utma=1.1234567890.1234567890.1234567890.1234567890.12; wd=2560x1600
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Connection: keep-alive

Sam­ple plain­text re­sponse:

HTTP/1.1 200 OK
Content-Length: 15
Content-Type: text/plain; charset=UTF-8
Server: Example
Date: Wed, 17 Apr 2013 12:00:00 GMT

Hello, World!

It is so sim­ple that it will ac­cen­tu­ate the im­pact of mon­i­tor­ing.

The for­tunes bench­mark, in con­trast, in­volves a re­la­tional data­base and HTML tem­plat­ing.

Sam­ple for­tunes re­quest:

GET /fortunes HTTP/1.1
Host: server
User-Agent: Mozilla/5.0 (X11; Linux x86_64) Gecko/20130501 Firefox/30.0 AppleWebKit/600.00 Chrome/30.0.0000.0 Trident/10.0 Safari/600.00
Cookie: uid=12345678901234567890; __utma=1.1234567890.1234567890.1234567890.1234567890.12; wd=2560x1600
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Connection: keep-alive

Sam­ple for­tunes re­sponse:

HTTP/1.1 200 OK
Content-Length: 1196
Content-Type: text/html; charset=UTF-8
Server: Example
Date: Wed, 17 Apr 2013 12:00:00 GMT

<!DOCTYPE html><html><head><title>Fortunes</title></head><body><table><tr><th>id</th><th>message</th></tr><tr><td>11</td><td><script>alert("This should not be displayed in a browser alert box.");</script></td></tr><tr><td>4</td><td>A bad random number generator: 1, 1, 1, 1, 1, 4.33e+67, 1, 1, 1</td></tr><tr><td>5</td><td>A computer program does what you tell it to do, not what you want it to do.</td></tr><tr><td>2</td><td>A computer scientist is someone who fixes things that aren't broken.</td></tr><tr><td>8</td><td>A list is only as strong as its weakest link. — Donald Knuth</td></tr><tr><td>0</td><td>Additional fortune added at request time.</td></tr><tr><td>3</td><td>After enough decimal places, nobody gives a damn.</td></tr><tr><td>7</td><td>Any program that runs right is obsolete.</td></tr><tr><td>10</td><td>Computers make very fast, very accurate mistakes.</td></tr><tr><td>6</td><td>Emacs is a nice operating system, but I prefer UNIX. — Tom Christaensen</td></tr><tr><td>9</td><td>Feature: A bug with seniority.</td></tr><tr><td>1</td><td>fortune: No such file or directory</td></tr><tr><td>12</td><td>??????????????</td></tr></table></body></html>

It’s more rep­re­sen­ta­tive of real-​world ap­pli­ca­tions.

Measurements

We’ve mea­sured through­put at dif­fer­ent con­cur­rency lev­els. The blue/red bars in­di­cate the num­ber of re­quests per sec­ond when mon­i­tor­ing is dis­abled/en­abled.

Plaintext

Throughput in the plaintext benchmark

In the plain­text bench­mark, mon­i­tor­ing has such an im­pact that the through­put doesn’t in­crease with the con­cur­rency level. When mon­i­tor­ing is en­abled, the through­put de­creased by 59.90% on av­er­age.

Fortunes

Throughput in the fortunes benchmark

In the for­tunes bench­mark, which is more re­al­is­tic, the through­put de­creased by 8.24% on av­er­age.

Analysis

After mea­sur­ing through­put, we’ve pro­filed the server with async-profiler in CPU mode.

In the flame­graph gen­er­ated for the plain­text bench­mark, meth­ods re­lated to mon­i­tor­ing with Mi­crom­e­ter, high­lighted in fuch­sia, are dom­i­nant:

Flamegraph of the plaintext benchmark

This looks con­sis­tent with the mea­sure­ments de­scribed pre­vi­ously: if a lot of server work is ded­i­cated to mon­i­tor­ing, the through­put can only de­crease. But what’s the prob­lem ex­actly?

Let’s focus on one method: io/vertx/core/http/impl/Http1xServerConnection.reportResponseComplete. This method is in­voked by Vert.x when the user has fin­ished send­ing an HTTP re­sponse to the client.

Flamegraph of the plaintext benchmark, focusing on reportResponseComplete

A great por­tion of the graph is oc­cu­pied by calls to the fol­low­ing meth­ods:

  • io.vertx.micrometer.impl.meters.Counters#get(java.lang.Iterable<io.micrometer.core.instrument.Tag>, java.lang.String...)
  • io.vertx.micrometer.impl.meters.Summaries#get(java.lang.Iterable<io.micrometer.core.instrument.Tag>, java.lang.String...)
  • io.vertx.micrometer.impl.meters.Timers#get(java.lang.Iterable<io.micrometer.core.instrument.Tag>, java.lang.String...)

In Vert.x 4, these meth­ods are used to dy­nam­i­cally re­trieve a Mi­crom­e­ter Counter, DistributionSummary or Timer. By dy­nam­i­cally, we mean Vert.x Mi­crom­e­ter Met­rics looks up an entry in Mi­crom­e­ter’s MeterRegistry be­fore re­port­ing the mea­sure­ment.

And it does so for each and every mon­i­tored event, not just the end of an HTTP re­sponse: bytes sent over a socket, event bus mes­sage re­ceived, con­nec­tion closed, the list goes on.

What about the costs of each lookup? Let’s con­sider the case of coun­ters.

Flamegraph of the plaintext benchmark, focusing on registerMeterIfNecessary

In this case, the graph is dom­i­nated by io/micrometer/core/instrument/MeterRegistry#getMappedId. This method is in­voked by the Mi­crom­e­ter reg­istry to com­pute the ef­fec­tive iden­ti­fier of a meter, ap­ply­ing all the reg­is­tered MeterFilters.

In Vert.x 4, only one fil­ter is reg­is­tered by de­fault. It re­moves a set of ig­nored tags (de­fined in met­rics op­tions):

  private static MeterFilter ignoreTags(Set<String> ignored) {
    return new MeterFilter() {
      @Override
      public Meter.Id map(Meter.Id id) {
        List<Tag> tags = new ArrayList<>();
        int count = 0;
        for (Tag tag : id.getTagsAsIterable()) {
          if (!ignored.contains(tag.getKey())) {
            tags.add(tag);
          }
          count++;
        }
        return tags.size() == count ? id : id.replaceTags(tags);
      }
    };
  }

As you can see, the fil­ter it­er­ates the orig­i­nal set of tags and looks up for the tag key in the ig­nored set.

Improvements - step 1

We’ve made sev­eral im­prove­ments in Vert.x 5 to re­duce the cost of mon­i­tor­ing with Vert.x Mi­crom­e­ter Met­rics.

Avoid meters lookup (where possible)

Where pos­si­ble, we should avoid look­ing up for me­ters dy­nam­i­cally.

There are met­rics which tags are known from ap­pli­ca­tion startup. For ex­am­ple, in pool met­rics, the pool type and pool name are de­fined when the pool is cre­ated. Con­se­quently, the me­ters can be looked up once and reused later.

Filter-out tags upfront instead of using a filter

In­stead of be­lat­edly re­mov­ing ig­nored tags with a fil­ter, we should avoid adding them to the meter de­f­i­n­i­tion in the first place.

Doing so pro­vides a cou­ple of ben­e­fits.

Firstly, since Vert.x tags are de­fined in the Java enum io.vertx.micrometer.Label, we can ver­ify if a spe­cific tag is ig­nored by check­ing its pres­ence in a java.util.EnumSet. This is more ef­fi­cient than using the string rep­re­sen­ta­tion of tag keys and a java.util.HashSet.

Sec­ondly, the server won’t have to cre­ate re­dun­dant io.micrometer.core.instrument.Tags in­stances. io.micrometer.core.instrument.Tags ob­jects are im­mutable. When meth­ods like io.micrometer.core.instrument.Tags#and(java.lang.String, java.lang.String) or io.micrometer.core.instrument.Tags#and(io.micrometer.core.instrument.Tag...) are called, a new in­stance that com­bines the tags is re­turned. Every in­vo­ca­tion im­plies:

  • copy­ing pre­vi­ous and new tags in a new array,
  • sort­ing this array,
  • re­mov­ing du­pli­cates.

Results

In the Vert.x Mi­crom­e­ter Met­rics PR#200, we’ve ap­plied the fol­low­ing changes:

And then we made new mea­sure­ments.

Plaintext

Throughput in the plaintext benchmark step 1

In the plain­text bench­mark, when mon­i­tor­ing is en­abled, through­put im­proved by 36% on av­er­age.

We shouldn’t pay too much at­ten­tion to this num­ber. Re­mem­ber, the plain­text bench­mark is an ex­tremely sim­pli­fied HTTP in­ter­ac­tion, thus not rep­re­sen­ta­tive of most ap­pli­ca­tions. Nev­er­the­less, it tells us we’re going in the right di­rec­tion.

Fortunes

Throughput in the fortunes benchmark step 1

In the for­tunes bench­mark, when mon­i­tor­ing is en­abled, through­put im­proved by 3.67% on av­er­age. In other words, with these im­prove­ments, the im­pact of mon­i­tor­ing on through­put is down from 8.24% to 4.88% on av­er­age.

That’s nice, but not yet sat­is­fac­tory.

Improvements - step 2

We pro­filed the server once more, and in the re­sult­ing flame­graph, fo­cused on the io/vertx/core/http/impl/Http1xServerConnection.reportResponseComplete method.

Flamegraph of the plaintext benchmark step 1, focusing on reportResponseComplete

This time, we can see a great por­tion of the graph is oc­cu­pied by these meth­ods:

  • io/micrometer/core/instrument/Counter$Builder.register
  • io/micrometer/core/instrument/DistributionSummary$Builder.register
  • io/micrometer/core/instrument/Timer$Builder.register

In­deed, we still have to look up some me­ters dy­nam­i­cally, be­cause some tag val­ues such as the HTTP re­sponse are not known in ad­vance:

    @Override
    public void responseEnd(RequestMetric requestMetric, HttpResponse response, long bytesWritten) {
      Tags responseTags = requestMetric.tags.and(Labels.toTags(HTTP_ROUTE, requestMetric.getRoute(), HTTP_CODE, String.valueOf(response.statusCode())));
      counter(names.getHttpRequestsCount())
        .description("Number of processed requests")
        .tags(responseTags)
        .register(registry)
        .increment();
      requestMetric.sample.stop(timer(names.getHttpResponseTime())
        .description("Request processing time")
        .tags(responseTags)
        .register(registry));
      distributionSummary(names.getHttpResponseBytes())
        .description("Size of responses in bytes")
        .tags(responseTags)
        .register(registry)
        .record(bytesWritten);
      if (requestMetric.responseEnded()) {
        requestMetric.requests.decrement();
      }
    }

But what hap­pens when we reg­is­ter a meter in the Mi­crom­e­ter reg­istry?

    private <M extends Meter> M registerMeterIfNecessary(Class<M> meterClass, Meter.Id id,
            @Nullable DistributionStatisticConfig config, BiFunction<Meter.Id, DistributionStatisticConfig, M> builder,
            Function<Meter.Id, M> noopBuilder) {
        Id mappedId = getMappedId(id);
        Meter m = getOrCreateMeter(config, builder, id, mappedId, noopBuilder);

        if (!meterClass.isInstance(m)) {
            throw new IllegalArgumentException(
                    format("There is already a registered meter of a different type (%s vs. %s) with the same name: %s",
                            m.getClass().getSimpleName(), meterClass.getSimpleName(), id.getName()));
        }
        return meterClass.cast(m);
    }

Mi­crom­e­ter com­putes the mapped id (re­mem­ber, fil­ters can trans­form the meter id), then re­trieves an ex­ist­ing meter from a java.util.concurrent.ConcurrentHashMap, or cre­ates it, if it doesn’t exist.

As a re­sult, when the ap­pli­ca­tion reaches a steady state, all Vert.x event loops con­cur­rently read val­ues from a sin­gle java.util.concurrent.ConcurrentHashMap.

Results

In the Vert.x Mi­crom­e­ter Met­rics PR#200, we’ve ap­plied the fol­low­ing ad­di­tional change:

And then we made new mea­sure­ments.

Plaintext

Throughput in the plaintext benchmark step 2

In the plain­text bench­mark, when mon­i­tor­ing is en­abled, through­put im­proved by 59.36% on av­er­age.

Again, the plain­text bench­mark is an ex­treme case. But some Vert.x users run ap­pli­ca­tions that reply to HTTP re­quests using a data set fully loaded in mem­ory. For them, the per-​thread meter cache should make a huge dif­fer­ence.

Fortunes

Throughput in the fortunes benchmark step 2

In the for­tunes bench­mark, when mon­i­tor­ing is en­abled, the im­pact of mon­i­tor­ing on through­put is down from 10.4% to 4.05% on av­er­age.

Improvements - step 3

In a re­cent con­ver­sa­tion with the Mi­crom­e­ter team, we learnt that Mi­crom­e­ter v1.12 in­tro­duced the MeterProvider API. In short, meter providers allow to re­duce al­lo­ca­tions by:

  • gath­er­ing all the in­for­ma­tion given to a meter builder upon cre­ation, and
  • look­ing up a met­ric after adding extra tags known after an event has oc­cured.

For ex­am­ple, if you have to count HTTP re­sponses with their sta­tus codes:

// On startup
MeterProvider<Counter> httpRequestsCount = Counter.builder(names.getHttpRequestsCount())
  .description("Number of processed requests")
  .withRegistry(registry);

// When the response ends
httpRequestsCount.withTags(responseTags).increment();

Results

In the Vert.x Mi­crom­e­ter Met­rics PR#218, we’ve ap­plied the fol­low­ing ad­di­tional change:

And then we made new mea­sure­ments.

Plaintext

Throughput in the plaintext benchmark step 3

In the plain­text bench­mark, the through­put using the MeterProvider im­ple­men­ta­tion (red bars) is not as good as using a meter cache (blue bars). But:

  • the dif­fer­ence is rel­a­tively small, and
  • the Vert.x Mi­crom­e­ter Met­rics mod­ule im­ple­men­ta­tion is greatly sim­pli­fied.

Fortunes

Throughput in the fortunes benchmark step 3

In the plain­text bench­mark, the through­put using the MeterProvider im­ple­men­ta­tion (red bars) is as good as using a meter cache (blue bars), or bet­ter.

Conclusion

Mon­i­tor­ing ap­pli­ca­tions comes with a cost that is un­avoid­able. Usu­ally, this cost is neg­li­gi­ble, but there are use cases when the im­pact on through­put is rel­a­tively im­por­tant.

With a good re­pro­ducer and a proper bench­mark, we were able to gain in­sights into Mi­crom­e­ter’s de­sign and im­prove the way Vert.x works with this great mon­i­tor­ing li­brary.

In sum­mary, if you in­te­grate with Mi­crom­e­ter in your projects, you should con­sider:

  • fil­ter­ing tags up front in­stead of using a Mi­crom­e­ter MeterFilter
  • avoid­ing re­dun­dant copies when com­bin­ing tags
  • look­ing up me­ters once if their tags are known in ad­vance
  • use a MeterProvider when some tags can be de­ter­mined only after an event oc­cured.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK