63

Iteration Over Java Collections With High Performance

 5 years ago
source link: https://www.tuicool.com/articles/hit/RBnMvaF
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Introduction

Java developers usually deal with collections such as ArrayList  and  HashSet.  Java 8 came with lambda and the  streaming API that helps us to easily work with collections. In most cases, we work with a few thousands of items and performance isn't a concern. But, in some extreme situations, when we have to travel over a few millions of items several times, performance will become a pain.

I use JMH for checking the running time of each code snippet.

forEach vs. C Style vs. Stream API

Iteration is a basic feature. All programming languages have simple syntax to allow programmers to run through collections. Stream API can iterate over Collections in a very straightforward manner.

public void streamSingleThread(BenchMarkState state){
        state.testData.stream().forEach(item -> {
            // Do something
        });
    }

The forEach   loop is just as simple:

public void forEach(BenchMarkState state){
        for(Integer temp : state.testData){
            // Do something
        }
    }

C style is more verbose, but still very compact:

public void forCStyle(BenchMarkState state){
        int size = state.testData.size();
        for(int j = 0; j < size; j ++){
            state.testData.get(j);
            // Do something
        }
    }

Then, the performance:

Benchmark                               Mode  Cnt        Score       Error  Units
TestLoopPerformance.forCStyle           avgt  200        3.182 ±     0.009  ns/op
TestLoopPerformance.forEach             avgt  200  7693143.747 ± 57712.787  ns/op
TestLoopPerformance.streamMultiThread   avgt  200  6974017.140 ± 66200.317  ns/op
TestLoopPerformance.streamSingleThread  avgt  200  7790435.456 ± 75683.894  ns/op

With C style, JVM simply increases an integer, then reads the value directly from memory. This makes it very fast. But forEach is very different, according to this answer on StackOverFlow and document from Oracle , JVM has to convert forEach to an iterator and call hasNext() with every item. This is why forEach is much slower than the C style.

Which Is the High-Performance Way to Travelling Over Set?

We define test data:

@State(Scope.Benchmark)
    public static class BenchMarkState {
        @Setup(Level.Trial)
        public void doSetup() {
            for(int i = 0; i < 500000; i++){
                testData.add(Integer.valueOf(i));
            }
        }
        @TearDown(Level.Trial)
        public void doTearDown() {
            testData = new HashSet<>(500000);
        }
        public Set<Integer> testData = new HashSet<>();
    }

The Java Set also supports Stream API and forEach loop. According to the previous test, if we convert  Set to ArrayList , then travel over ArrayList , maybe the performance improve?

public void forCStyle(BenchMarkState state){
        int size = state.testData.size();
        Integer[] temp = (Integer[]) state.testData.toArray(new Integer[size]);
        List<Integer> tempList = new ArrayList<>(state.testData.size());
        for(int j = 0; j < size; j ++){
            tempList.add(temp[j]);
        }
    }

How about a combination of the iterator with the C style for loop?

public void forCStyleIterator(BenchMarkState state){
        int size = state.testData.size();
        List<Integer> tempList = new ArrayList<>(state.testData.size());
        Iterator<Integer> iteration = state.testData.iterator();
        for(int j = 0; j < size; j ++){
            tempList.add(iteration.next());
        }
    }

Or what about simple travel?

public void forEach(BenchMarkState state){
        List<Integer> tempList = new ArrayList<>(state.testData.size());
        for(Integer temp : state.testData){
            tempList.add(temp);
        }
    }

This is a nice idea, but it doesn't work because initializing the new ArrayList also consumes resources.

Benchmark                             Mode  Cnt        Score        Error  Units
TestLoopPerformance.forCStyleIterator avgt  200  4344457.729 ±  67534.559  ns/op
TestLoopPerformance.forEach           avgt  200  4553071.127 ±  63691.874  ns/op
TestLoopPerformance.forCStyle         avgt  200  6585935.929 ± 151084.531  ns/op

HashMap ( HashSet uses HashMap<E,Object> ) isn't designed for iterating all items. The fastest way to iterate over HashMap is a combination of Iterator and the C style for loop, because JVM doesn't have to call hasNext() .

Conclusion

Foreach and Stream API are convenient to work with Collections.  You can write code faster. But, when your system is stable and performance is a major concern, you should think about rewriting your loop.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK