Iteration Over Java Collections With High Performance
source link: https://www.tuicool.com/articles/hit/RBnMvaF
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Introduction
Java developers usually deal with collections such as ArrayList
and HashSet.
Java 8 came with lambda and the streaming API
that helps us to easily work with collections. In most cases, we work with a few thousands of items and performance isn't a concern. But, in some extreme situations, when we have to travel over a few millions of items several times, performance will become a pain.
I use JMH for checking the running time of each code snippet.
forEach vs. C Style vs. Stream API
Iteration is a basic feature. All programming languages have simple syntax to allow programmers to run through collections. Stream
API can iterate over Collections
in a very straightforward manner.
public void streamSingleThread(BenchMarkState state){ state.testData.stream().forEach(item -> { // Do something }); }
The forEach
loop is just as simple:
public void forEach(BenchMarkState state){ for(Integer temp : state.testData){ // Do something } }
C style is more verbose, but still very compact:
public void forCStyle(BenchMarkState state){ int size = state.testData.size(); for(int j = 0; j < size; j ++){ state.testData.get(j); // Do something } }
Then, the performance:
Benchmark Mode Cnt Score Error Units TestLoopPerformance.forCStyle avgt 200 3.182 ± 0.009 ns/op TestLoopPerformance.forEach avgt 200 7693143.747 ± 57712.787 ns/op TestLoopPerformance.streamMultiThread avgt 200 6974017.140 ± 66200.317 ns/op TestLoopPerformance.streamSingleThread avgt 200 7790435.456 ± 75683.894 ns/op
With C style, JVM simply increases an integer, then reads the value directly from memory. This makes it very fast. But forEach
is very different, according to this answer on StackOverFlow
and document from Oracle
, JVM has to convert forEach
to an iterator and call hasNext()
with every item. This is why forEach
is much slower than the C style.
Which Is the High-Performance Way to Travelling Over Set?
We define test data:
@State(Scope.Benchmark) public static class BenchMarkState { @Setup(Level.Trial) public void doSetup() { for(int i = 0; i < 500000; i++){ testData.add(Integer.valueOf(i)); } } @TearDown(Level.Trial) public void doTearDown() { testData = new HashSet<>(500000); } public Set<Integer> testData = new HashSet<>(); }
The Java Set also supports Stream
API and forEach
loop. According to the previous test, if we convert Set
to ArrayList
, then travel over ArrayList
, maybe the performance improve?
public void forCStyle(BenchMarkState state){ int size = state.testData.size(); Integer[] temp = (Integer[]) state.testData.toArray(new Integer[size]); List<Integer> tempList = new ArrayList<>(state.testData.size()); for(int j = 0; j < size; j ++){ tempList.add(temp[j]); } }
How about a combination of the iterator with the C style for
loop?
public void forCStyleIterator(BenchMarkState state){ int size = state.testData.size(); List<Integer> tempList = new ArrayList<>(state.testData.size()); Iterator<Integer> iteration = state.testData.iterator(); for(int j = 0; j < size; j ++){ tempList.add(iteration.next()); } }
Or what about simple travel?
public void forEach(BenchMarkState state){ List<Integer> tempList = new ArrayList<>(state.testData.size()); for(Integer temp : state.testData){ tempList.add(temp); } }
This is a nice idea, but it doesn't work because initializing the new ArrayList
also consumes resources.
Benchmark Mode Cnt Score Error Units TestLoopPerformance.forCStyleIterator avgt 200 4344457.729 ± 67534.559 ns/op TestLoopPerformance.forEach avgt 200 4553071.127 ± 63691.874 ns/op TestLoopPerformance.forCStyle avgt 200 6585935.929 ± 151084.531 ns/op
HashMap
( HashSet
uses HashMap<E,Object>
) isn't designed for iterating all items. The fastest way to iterate over HashMap
is a combination of Iterator
and the C style for
loop, because JVM doesn't have to call hasNext()
.
Conclusion
Foreach
and Stream
API are convenient to work with Collections.
You can write code faster. But, when your system is stable and performance is a major concern, you should think about rewriting your loop.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK