4

Don't Fear the Builder

 3 years ago
source link: https://richardstartin.github.io/posts/dont-fear-the-builder
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Don't Fear the Builder

Dec 12, 2017 | Richard Startin | javaanalysis

Writing a nontrivial piece of code requires a programmer to make countless tiny decisions, and to produce good quality code quickly, many of these decisions must be made subconciously. But what happens if your instinct is mostly superstitious bullshit? I instinctively avoid allocating objects when I don’t have to, so would not voluntarily resort to using things like EqualsBuilder or HashCodeBuilder from Apache Commons. I feel dismayed whenever I am asked about the relationship between hash code and equals in interviews for contracts. This indicates a lack of mutual respect, but what’s revealing is that enough people passing themselves off as developers don’t know this stuff that it makes the question worth asking. EqualsBuilder and HashCodeBuilder make it so you don’t actually need to know, making it safe to put any object in a HashSet, whoever wrote the class. But should you use these classes just because they protect the naive (and you from the naive)? Is it a runtime tax? Or is any perceived cost magically eradicated by the JVM? It’s time to benchmark my instinct.

Baseline

There are definitely no allocations in the code IntelliJ (or similar) would generate for equals and hashCode automatically. I will use this generated code as the baseline: if I can match or beat this code with EqualsBuilder or HashCodeBuilder respectively, I can dismiss my prejudice. Here’s the baseline class:

public class Baseline {
    
    private final String a;
    private final int b;
    private final String c;
    private final Instant d;

    public Baseline(String a, int b, String c, Instant d) {
        this.a = a;
        this.b = b;
        this.c = c;
        this.d = d;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;

        Baseline baseline = (Baseline) o;

        if (b != baseline.b) return false;
        if (!a.equals(baseline.a)) return false;
        if (!c.equals(baseline.c)) return false;
        return d.equals(baseline.d);
    }

    @Override
    public int hashCode() {
        int result = a.hashCode();
        result = 31 * result + b;
        result = 31 * result + c.hashCode();
        result = 31 * result + d.hashCode();
        return result;
    }
}

It’s not pretty and it’s a real pain to keep generated code up to date when adding and removing fields (you can delete it and regenerate it but it creates merge noise) but there’s no obvious way to improve on this code. It’s correct and looks fast. To benchmark alternatives, I want to look at the effect on set membership queries against a HashSet of various sizes.

Builder Implementation

I will contrast this with the following code using code from Apache Commons (incidentally, also generated by IntelliJ):

public class Builder {

    private final String a;
    private final int b;
    private final String c;
    private final Instant d;

    public Builder(String a, int b, String c, Instant d) {
        this.a = a;
        this.b = b;
        this.c = c;
        this.d = d;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;

        if (o == null || getClass() != o.getClass()) return false;

        Builder builder = (Builder) o;

        return new EqualsBuilder()
                .append(b, builder.b)
                .append(a, builder.a)
                .append(c, builder.c)
                .append(d, builder.d)
                .isEquals();
    }

    @Override
    public int hashCode() {
        return new HashCodeBuilder(17, 37)
                .append(a)
                .append(b)
                .append(c)
                .append(d)
                .toHashCode();
    }
}

This code is a bit neater and easier to maintain, but does it have an observable cost?

HashSet Benchmark

For a parametrised range of HashSet sizes, I measure throughput for calls to contains. This requires that both hashCode and equals are called, the latter potentially several times. There is bias in this benchmark, because the implementation producing the best hash function will minimise the calls to equals, but I am willing to reward the better implementation rather than maniacally isolate any attributable allocation cost.

The code is simple, if a little repetitive. For each implementation, I make some random instances of my class, and make one that I can be sure isn’t in the set. I measure the cost of repeated invocations to the hashCode and equals methods by finding an object, and searching but failing to find an object separately.

@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class HashSetContainsBenchmark {

    @Param({"8", "96", "1024", "8192"})
    int setSize;

    private HashSet<Baseline> baselineSet;
    private Baseline presentBaseline;
    private Baseline missingBaseline;

    private HashSet<Builder> builderSet;
    private Builder presentBuilder;
    private Builder missingBuilder;

    @Setup(Level.Trial)
    public void setup() {
        setupBaselineState();
        setupBuilderState();
    }

    private void setupBaselineState() {
        baselineSet = new HashSet<>();
        for (int i = 0; i < setSize; ++i) {
            while(!baselineSet.add(newBaselineInstance()));
        }
        presentBaseline = baselineSet.iterator().next();
        do {
            missingBaseline = newBaselineInstance();
        } while (baselineSet.contains(missingBaseline));
    }

    private void setupBuilderState() {
        builderSet = new HashSet<>();
        for (int i = 0; i < setSize; ++i) {
            while(!builderSet.add(newBuilderInstance()));
        }
        presentBuilder = builderSet.iterator().next();
        do {
            missingBuilder = newBuilderInstance();
        } while (builderSet.contains(missingBuilder));
    }

    @Benchmark
    public boolean findPresentBaselineInstance() {
        return baselineSet.contains(presentBaseline);
    }

    @Benchmark
    public boolean findMissingBaselineInstance() {
        return baselineSet.contains(missingBaseline);
    }

    @Benchmark
    public boolean findPresentBuilderInstance() {
        return builderSet.contains(presentBuilder);
    }

    @Benchmark
    public boolean findMissingBuilderInstance() {
        return builderSet.contains(missingBuilder);
    }


    private static Baseline newBaselineInstance() {
        ThreadLocalRandom random = ThreadLocalRandom.current();
        return new Baseline(UUID.randomUUID().toString(),
                random.nextInt(),
                UUID.randomUUID().toString(),
                Instant.ofEpochMilli(random.nextLong(0, Instant.now().toEpochMilli())));
    }

    private static Builder newBuilderInstance() {
        ThreadLocalRandom random = ThreadLocalRandom.current();
        return new Builder(UUID.randomUUID().toString(),
                random.nextInt(),
                UUID.randomUUID().toString(),
                Instant.ofEpochMilli(random.nextLong(0, Instant.now().toEpochMilli())));
    }
}

Running this benchmark with org.apache.commons:commons-lang3:3.7 yields the throughput results below. There is an occasional and slight throughput degradation with the builders, but the builder implementation is sometimes faster in this benchmark, but the difference isn’t large enough to worry about.

Benchmark Mode Threads Samples Score Score Error (99.9%) Unit Param: setSize findMissingBaselineInstance thrpt 1 10 153.878772 7.993036 ops/us 8 findMissingBaselineInstance thrpt 1 10 135.835085 6.031588 ops/us 96 findMissingBaselineInstance thrpt 1 10 137.716073 25.731562 ops/us 1024 findMissingBaselineInstance thrpt 1 10 123.896741 15.484023 ops/us 8192 findMissingBuilderInstance thrpt 1 10 142.051528 4.177352 ops/us 8 findMissingBuilderInstance thrpt 1 10 132.326909 1.017351 ops/us 96 findMissingBuilderInstance thrpt 1 10 127.220440 8.680761 ops/us 1024 findMissingBuilderInstance thrpt 1 10 140.029701 6.272960 ops/us 8192 findPresentBaselineInstance thrpt 1 10 139.714236 1.626873 ops/us 8 findPresentBaselineInstance thrpt 1 10 140.244864 1.777298 ops/us 96 findPresentBaselineInstance thrpt 1 10 135.273760 7.587937 ops/us 1024 findPresentBaselineInstance thrpt 1 10 133.158555 5.101069 ops/us 8192 findPresentBuilderInstance thrpt 1 10 111.196443 18.804103 ops/us 8 findPresentBuilderInstance thrpt 1 10 124.008441 5.182294 ops/us 96 findPresentBuilderInstance thrpt 1 10 126.729750 4.286002 ops/us 1024 findPresentBuilderInstance thrpt 1 10 124.159285 5.287920 ops/us 8192

Where did the builder go?

It’s worth taking a look at the compiler output to find out what happened. Considering the case where the object isn’t found in the set, we can see that the code gets inlined, and the hash code must be quite good. By enabling the args -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining it’s clear to see that the entire hash code generation gets inlined into the calling loop, and that equals is never executed (suggesting that there are no collisions to resolve here) .

                              @ 8   java.util.HashSet::contains (9 bytes)   inline (hot)
                                @ 5   java.util.HashMap::containsKey (18 bytes)   inline (hot)
                                 \-> TypeProfile (17571/17571 counts) = java/util/HashMap
                                  @ 2   java.util.HashMap::hash (20 bytes)   inline (hot)
                                    @ 9   com.openkappa.simd.builder.Builder::hashCode (43 bytes)   inline (hot)
                                      @ 8   org.apache.commons.lang3.builder.HashCodeBuilder::<init> (60 bytes)   inline (hot)
                                        @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
                                        @ 26   org.apache.commons.lang3.Validate::isTrue (18 bytes)   unloaded signature classes
                                        @ 46   org.apache.commons.lang3.Validate::isTrue (18 bytes)   unloaded signature classes
                                      @ 15   org.apache.commons.lang3.builder.HashCodeBuilder::append (58 bytes)   inline (hot)
                                        @ 21   java.lang.Object::getClass (0 bytes)   (intrinsic)
                                        @ 24   java.lang.Class::isArray (0 bytes)   (intrinsic)
                                        @ 49   java.lang.String::hashCode (49 bytes)   inline (hot)
                                          @ 19   java.lang.String::isLatin1 (19 bytes)   inline (hot)
                                          @ 29   java.lang.StringLatin1::hashCode (42 bytes)   inline (hot)
                                      @ 22   org.apache.commons.lang3.builder.HashCodeBuilder::append (17 bytes)   inline (hot)
                                      @ 29   org.apache.commons.lang3.builder.HashCodeBuilder::append (58 bytes)   inline (hot)
                                        @ 21   java.lang.Object::getClass (0 bytes)   (intrinsic)
                                        @ 24   java.lang.Class::isArray (0 bytes)   (intrinsic)
                                        @ 49   java.lang.String::hashCode (49 bytes)   inline (hot)
                                          @ 19   java.lang.String::isLatin1 (19 bytes)   inline (hot)
                                          @ 29   java.lang.StringLatin1::hashCode (42 bytes)   inline (hot)
                                      @ 36   org.apache.commons.lang3.builder.HashCodeBuilder::append (58 bytes)   inline (hot)
                                        @ 21   java.lang.Object::getClass (0 bytes)   (intrinsic)
                                        @ 24   java.lang.Class::isArray (0 bytes)   (intrinsic)
                                        @ 49   java.time.Instant::hashCode (22 bytes)   inline (hot)
                                      @ 39   org.apache.commons.lang3.builder.HashCodeBuilder::toHashCode (5 bytes)   accessor
                                  @ 6   java.util.HashMap::getNode (148 bytes)   inline (hot)
                                    @ 59   com.openkappa.simd.builder.Builder::equals (84 bytes)   never executed
                                    @ 126   com.openkappa.simd.builder.Builder::equals (84 bytes)   never executed

So the code is clearly small enough to get inlined. But what about the builder itself, isn’t it allocated? Not if it doesn’t escape. On a debug JVM, it’s possible to observe the removal of the builder’s allocation the JVM args -XX:+PrintEscapeAnalysis -XX:+PrintEliminateAllocations. However, on a production JVM, it can be observed indirectly by measuring the difference when escape analysis is disabled with -XX:-DoEscapeAnalysis.

Benchmark Mode Threads Samples Score Score Error (99.9%) Unit Param: setSize findMissingBaselineInstance thrpt 1 10 150.972758 5.095163 ops/us 8 findMissingBaselineInstance thrpt 1 10 140.236057 2.222751 ops/us 96 findMissingBaselineInstance thrpt 1 10 132.680494 5.225503 ops/us 1024 findMissingBaselineInstance thrpt 1 10 115.220385 4.232488 ops/us 8192 findMissingBuilderInstance thrpt 1 10 68.869355 4.479944 ops/us 8 findMissingBuilderInstance thrpt 1 10 67.786351 5.980353 ops/us 96 findMissingBuilderInstance thrpt 1 10 71.113062 3.057181 ops/us 1024 findMissingBuilderInstance thrpt 1 10 75.088839 1.592294 ops/us 8192 findPresentBaselineInstance thrpt 1 10 141.425675 2.630898 ops/us 8 findPresentBaselineInstance thrpt 1 10 142.636359 2.854795 ops/us 96 findPresentBaselineInstance thrpt 1 10 143.939199 0.475918 ops/us 1024 findPresentBaselineInstance thrpt 1 10 142.006635 3.098352 ops/us 8192 findPresentBuilderInstance thrpt 1 10 71.546971 6.152584 ops/us 8 findPresentBuilderInstance thrpt 1 10 62.451705 11.896730 ops/us 96 findPresentBuilderInstance thrpt 1 10 68.903442 3.263955 ops/us 1024 findPresentBuilderInstance thrpt 1 10 69.576038 4.047581 ops/us 8192

Without escape analysis, the baseline code hardly slows down, whereas the change in the builder benchmarks is stark. The difference of differences here isolates the cost of allocating the HashCodeBuilder on the heap, though the builder has not been allocated on the stack - it has been replaced by its fields, which are allocated on the stack. Escape analysis is enabled by default, and the key point here is that even if the code looks like it might be slower, it might be as good or better - it depends what the JIT does with it.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK