62

JVM Anatomy Park #18: Scalar Replacement

 6 years ago
source link: https://shipilev.net/jvm-anatomy-park/18-scalar-replacement/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

JVM Anatomy Quark #18

JVM Anatomy Quark #18: Scalar Replacement

About, Disclaimers, Contacts

"JVM Anatomy Quarks" is the on-going mini-post series, where every post is describing some elementary piece of knowledge about JVM. The name underlines the fact that the single post cannot be taken in isolation, and most pieces described here are going to readily interact with each other.

The post should take about 5-10 minutes to read. As such, it goes deep for only a single topic, a single test, a single benchmark, a single observation. The evidence and discussion here might be anecdotal, not actually reviewed for errors, consistency, writing 'tyle, syntaxtic and semantically errors, duplicates, or also consistency. Use and/or trust this at your own risk.

Aleksey Shipilëv, JVM/Performance Geek
Shout out at Twitter: @shipilev; Questions, comments, suggestions: [email protected]

Question

I have heard Hotspot can do stack allocation. Called Escape Analysis, and it is magical. Right?

Theory

This gets a fair bit of confusion. In "stack allocation", "allocation" seems to assume that the entire object is allocated on the stack instead of the heap. But what really happens is that the compiler performs the so called Escape Analysis (EA), which can identify which newly created objects are not escaping into the heap, and then it can do a few interesting optimizations. Note that EA itself is not the optimization, it is the analysis phase that gives important pieces of data for the optimizer.[1]

One of the things that optimizer can do for non-escaping objects is to remap the accesses to the object fields to accesses to synthetic local operands:[2] perform Scalar Replacement. Since those operands are then handled by register allocator, some of them may claim stack slots (get "spilled") in current method activation, and it might look like the object field block is allocated on stack. But this is a false symmetry: operands may not even materialize at all, or may reside in registers, object header is not created at all, etc. The operands that get mapped from object field accesses might not even be contiguous on stack! This is different from stack allocation.

If stack allocation was really done, it would allocate the entire object storage on the stack, including the header and the fields, and reference it in the generated code. The caveat in this scheme is that once the object is escaping, we would need to copy the entire object block from the stack to the heap, because we cannot be sure current thread stays in the method and keeps this part of the stack holding the object alive. Which means we have to intercept stores to the heap, in case we ever store stack-allocated object — that is, do the GC write barrier.

Hotspot does not do stack allocations per se, but it does approximate that with Scalar Replacement.

Can we observe this in practice?

Practice

Consider this JMH benchmark. We create the object with a single field that is initialized off our input, and it reads the field right away, discarding the object:

If you run the test with -prof gc, you would notice it does not allocate anything:

-prof perfasm shows there is only a single access to field x left.

Notice the magic of it: the compiler was able to detect that MyObject instance is not escaping, remapped its fields to local operands, and then (drum-roll) identified that successive store to that operand follows the load, and eliminated that store-load pair altogether — as it would do with local variables! Then, pruned the allocation, because it is not needed anymore, and any reminiscent of the object had evaporated.

Of course, that requires a sophisticated EA implementation to identify non-escaping candidates. When EA breaks, Scalar Replacement also breaks. The most trivial breakage in current Hotspot EA is when control flow merges before the access. For example, if we have two different objects (yet with the same content), under the branch that selects either of them, EA breaks, even though both objects are evidently (for us, humans) non-escaping:

Here, the code allocates:

If that was a "true" stack allocation, it would trivially handle this case: it’d extend the stack at runtime for either allocation, do the accesses, then scratch off the stack contents before leaving the method, and stack allocations would get retracted. The complication with write barriers that should guard object escapes still stands.

Observations

Escape analysis is an interesting compiler technique that enables interesting optimizations. Scalar Replacement is one of them, and it is not about putting the object storage on stack. Instead, it is about exploding the object and rewriting the code into local accesses, and optimizing them further, sometimes spilling these accesses on stack when register pressure is high. In many cases on critical hotpaths it can be successfully and profitably done.

But, EA is not ideal: if we cannot statically determine the object is not escaping, we have to assume it does. Complicated control flow may bail earlier. Calling non-inlined — and thus opaque for current analysis — instance method bails. Doing some things that rely on object identity bail, although trivial things like reference comparison with non-escaping objects gets folded efficiently.

This is not an ideal optimization, but when it works, it works magnificently well. Further improvements in compiler technology might widen the number of cases where EA works well.[3]


1. I am mildly irritated when people claim EA does something: it’s not, further optimizations do!
2. Like the ones the intermediate representation has for local variables and other temporary operands compiler wants to have
3. For example, Graal is known to have Partial Escape Analysis, that is supposed to be more resilient in complex data flows
Last updated 2019-03-03 11:50:20 +0300

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK