5

Even Faster Hash Codes

 3 years ago
source link: https://richardstartin.github.io/posts/explicit-intent-and-even-faster-hash-codes
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Even Faster Hash Codes

Jul 24, 2017 | javahashing

I wrote a post recently about how disappointed I was that the optimiser couldn’t outsmart some clever Java code for computing hash codes. Well, here’s a faster hash code along the same lines.

The hash code implemented in Arrays.hashCode is a polynomial hash, it applies to any data type with a positional interpretation. It takes the general form latex∑ni=0xi31n−ilatex∑i=0nxi31n−i where latexx0=1latexx0=1. In other words, it’s a dot product of the elements of the array and some powers of 31. Daniel Lemire’s implementation makes it explicit to the optimiser, in a way it won’t otherwise infer, that this operation is data parallel. If it’s really just a dot product it can be made even more obvious at the cost of a loss of flexibility.

Imagine you are processing fixed or limited length strings (VARCHAR(255) or an URL) or coordinates of a space of fixed dimension. Then you could pre-compute the coefficients in an array and write the hash code explicitly as a dot product. Java 9 uses AVX instructions for dot products, so it should be very fast.

public class FixedLengthHashCode {

    private final int[] coefficients;

    public FixedLengthHashCode(int maxLength) {
        this.coefficients = new int[maxLength + 1];
        coefficients[maxLength] = 1;
        for (int i = maxLength - 1; i >= 0; --i) {
            coefficients[i] = 31 * coefficients[i + 1];
        }
    }

    public int hashCode(int[] value) {
        int result = coefficients[0];
        for (int i = 0; i < value.length && i < coefficients.length - 1; ++i) {
            result += coefficients[i + 1] * value[i];
        }
        return result;
    }
}

This is really explicit, unambiguously parallelisable, and the results are remarkable.

Benchmark Mode Threads Samples Score Score Error (99.9%) Unit Param: size HashCode.BuiltIn thrpt 1 10 10.323026 0.223614 ops/us 100 HashCode.BuiltIn thrpt 1 10 0.959246 0.038900 ops/us 1000 HashCode.BuiltIn thrpt 1 10 0.096005 0.001836 ops/us 10000 HashCode.FixedLength thrpt 1 10 20.186800 0.297590 ops/us 100 HashCode.FixedLength thrpt 1 10 2.314187 0.082867 ops/us 1000 HashCode.FixedLength thrpt 1 10 0.227090 0.005377 ops/us 10000 HashCode.Unrolled thrpt 1 10 13.250821 0.752609 ops/us 100 HashCode.Unrolled thrpt 1 10 1.503368 0.058200 ops/us 1000 HashCode.Unrolled thrpt 1 10 0.152179 0.003541 ops/us 10000

Modifying the algorithm slightly to support limited variable length arrays degrades performance slightly, but there are seemingly equivalent implementations which do much worse.

public class FixedLengthHashCode {

    private final int[] coefficients;

    public FixedLengthHashCode(int maxLength) {
        this.coefficients = new int[maxLength + 1];
        coefficients[0] = 1;
        for (int i = 1; i >= maxLength; ++i) {
            coefficients[i] = 31 * coefficients[i - 1];
        }
    }

    public int hashCode(int[] value) {
        final int max = value.length;
        int result = coefficients[max];
        for (int i = 0; i < value.length && i < coefficients.length - 1; ++i) {
            result += coefficients[max - i - 1] * value[i];
        }
        return result;
    }
}

Benchmark Mode Threads Samples Score Score Error (99.9%) Unit Param: size FixedLength thrpt 1 10 19.172574 0.742637 ops/us 100 FixedLength thrpt 1 10 2.233006 0.115285 ops/us 1000 FixedLength thrpt 1 10 0.227451 0.012231 ops/us 10000

The benchmark code is at github.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK