5

Static Compilation of Java Applications at Alibaba at Scale

 3 years ago
source link: https://medium.com/graalvm/static-compilation-of-java-applications-at-alibaba-at-scale-2944163c92e
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Static Compilation of Java Applications at Alibaba at Scale

Image for post
Image for post

This is a guest blog post written by Sanhong Li, Ziyi Lin, Chuansheng Lu and Kingsum Chow from the Alibaba JVM team.

Background

Cloud computing aims to provide computing resources as a service and the core principle of cloud computing is to use only those resources that are necessary to run an application and scale when needed. To take advantage of the benefits of cloud computing, developers should architect and write applications according to this principle.

A microservice architecture breaks a monolith application into many micro-applications (microservices). This is an attractive approach for applications targeting cloud computing platforms. We can start with the number of microservice instances needed to handle the initial load and scale out with more instances when demand is higher, improving resilience by leveraging the ability of clouds to scale horizontally.

The Java platform has become one of the most widely used platforms. Despite its popularity, Java has received many criticisms, such as Java is very slow to boot; Java takes too much memory; Java syntax is verbose. Notably, long boot time in Java has inhibited horizontal scalability. From a business standpoint, customers may have to wait a long time for an application to boot before the results of a request are received. Speeding up the boot time of Java applications on a horizontally scalable platform is our motivation. To achieve that, we adopted GraalVM native image in serverless computing.

GraalVM native image at Alibaba

Over the years, Java has proliferated in Alibaba. Many applications are written in Java. Approximately 10,000 Java developers have written more than a billion lines of Java code! Alibaba has customized most of its Java software based on the vibrant open-source ecosystem. In Alibaba Cloud, these Java programs are developed for online trading, payments, and logistics operations. Many of them are developed as microservices running on top of Kubernetes native environment to service online requests.

At Alibaba, we use the native image technology of GraalVM to statically compile a microservice application into an ELF executable file which results in native code startup times for Java applications. This is needed to address the horizontal scaling challenge described above.

In our scenario, this serverless application is developed based on the SOFABoot framework. Its fat jar size is 120MB+. Many typical components in Java Enterprise space are included such as Spring, Spring Boot, Tomcat, MySQL-Connector, and many others. We refer to applications using this framework as SOFABoot applications. SOFABoot applications were originally running on top of Alibaba Dragonwell (OpenJDK based) designed for a distributed architecture, handling online transactions, and communicating with many other different applications through RPC.

In the global online shopping festival (also called Double 11, or Nov 11) last year, we deployed a number of SOFABoot applications compiled as native images. They managed to serve real online requests in our production environment on a day with the highest transaction volume.

Besides the SOFABoot application, we have also explored the possibility of introducing statically compiled applications into Alibaba Cloud. We successfully deployed a native image version of the Micronaut demo application on Alibaba Cloud’s function computing platform.

In the following sections, we describe the challenges we overcame to use GraalVM native image to do the static compilation to achieve the performance gains in our production environment.

How We Did That

GraalVM native image provides a great set of tools for developers to close the gap between traditional and statically compiled Java and provides a way to migrate from the former to the latter. In this section, we will focus on the challenges we faced and the approaches we developed at Alibaba for compiling Java applications into native images. We also contributed many of the solutions back to the GraalVM community.

While most traditional Java features are supported by native image to build and run applications, there are still some limitations that prevent the automatic migration from traditional Java to statically compiled Java programs. Native image requires programmers to provide additional information or modify the original implementation of an application to get the program compiled and run as expected. The challenges we faced while adapting the SOFABoot application were:

Slow build time: Static compilation consumes a large amount of memory resources and time. The build time is long. In the beginning, it took around 100 GB of memory and 4000 seconds to build the SOFABoot application. We observed that the majority of the time was consumed on type flow analysis in the static analysis phase. So we employed a less precise but much lightweight CHA analysis to replace the original type flow analysis for the scenarios that require the fast build. After we employed the CHA approach, the memory needed to build was reduced from 100GB to 20GB and the build time was reduced from 4000 to less than 1000 seconds. We were delighted to see a 4X speedup in the build time which helped speed up the deployments of our applications.

Class initialization: Classes are initialized at runtime in traditional Java programs. Native image enables class initialization at build time whenever possible to improve the runtime performance. Eager class initialization at build time is not always safe and it still needs programmers to adjust the class initialization timing manually. Class initialization may happen in a chain so postponing one class initialization to runtime without postponing its predecessors in the chain may lead to class initialization errors at build time. For example, the following code has a class initialization chain of A->B->C.

For application correctness, class C MUST be initialized at runtime due to the call on System.currentTimeMillis(). As a result, the user MUST also do the initialization for class A at runtime since class A is the root of this class initialization chain — when class A gets initialized it triggers the initialization of B and then eventually C. However, in the actual scenario, when the developers observed class C has been mistakenly initialized at build time it was difficult for them to find out that class A was the root cause of the issue, i.e., the developer had mistakenly configured class A to be initialized at build time. Native image provides an initialization tracing feature based on instrumentation to resolve this kind of issue, but it fails when the class cannot be instrumented, e.g., when the bootstrap class loader loads the class. In our solution, we modified the Hotspot code to track the class initialization chain at the VM level and helped our developers to track the class initialization chain and locate the root cause of this kind of mistake. Thus our solution enables the broader use of ahead-of-time compilation by the Java developers.

Dynamic class loading: Dynamic class loading is defining and loading classes at runtime with the bytecodes of classes not known at build time. Dynamic class loading has been widely used in real-world applications, libraries, and frameworks. Some typical examples include the serialization/deserialization mechanism in Java which relies on dynamically generated constructor accessors, Spring which uses cglib for proxies, and Derby which uses a dynamic generated classes for SQL statements. We support dynamic class loading with 4 steps: 1) Modify the dynamic names of generated class’ as fixed ones. We guarantee the same class always has the same name across different runs. 2) Implement method interceptors in native image agent to dump dynamically generated class with fixed name pattern to the file system. 3) Compile the dumped classes into the native image at build time. 4) Find the prepared “dynamically generated” classes at native image runtime instead of defining them. We have committed this feature to the community.

Slow GC performance

In the world of static compilation, garbage collection is still an indispensable component. The default garbage collector in native image is a pure ‘Copy’ GC, which divides the heap space into two parts: young and old spaces. Java threads keep allocating objects in the young space, and when young space is full, a ‘Young GC’ is performed by evacuating all the live objects from young space to old space. When the old space is full, a ‘Full GC’ is performed by compacting all the live objects in Java heap and release free spaces.

This approach is relatively straightforward and useful for many small workloads, but when we tried to support larger workloads such as Spring-based services, the full GC time and frequency become a headache. We observed a single GC pause time could exceed 1.5 seconds for some Java services. That is unacceptable for online applications. So we made some improvements to the garbage collector component of native image as follows:

- Age information is added to objects in the young generation. Age is added to the memory chunk of a group of objects and it is increased by 1 if these objects survived a young GC; live objects are promoted to the old generation only after reaching a certain age threshold.

- A background thread is used to un-map memory asynchronously. Native image uses memory chunks to hold Java objects. When it wants to release free chunks to the OS to lower the footprint, it just un-maps the chunk. We observed that for a typical Java application un-mapping memory might cost a long time, so we make this operation asynchronous and execute it outside the stop-the-world pause.

- Image roots are scanned based on a card table in the young GC. For some specific workloads the final executable image may be large after static compilation, which usually holds a vast set of GC roots and has to be scanned thoroughly for any GC. In the existing design of the native image garbage collector this may cost much time. We added a card table for the image roots, and for young GC operations we only scan those references that got dirtied since last GC pause.

Some of these changes have been committed into the GraalVM project.

Performance Gains

Startup time speedup

After we made the changes to address the challenges of using native image, we collected the performance data for static compiled SOFABoot applications in our production environment. As shown in the Figure, the startup time decreased from 60 seconds to 3 seconds, i.e., 20X speedup in the starting up time of the Java app. In addition, the GC pause time was controlled to under 100 milliseconds.

Image for post
Image for post

Sofaboot Application Startup Time Comparison

We also ran the statically compiled version of the Micronaut-based application on the function computing platform of Alibaba Cloud. The result is also fascinating. native_image_hello is a statically compiled application, and springboot_hello is the same application deployed as a jar and run on top of traditional Java runtime. We have shown the results in the Figure below: native_image_hellois 100x faster at startup with 1/6 memory cost, which can help customers save 80% (“billed duration” is the time the customer is charged for on the cloud platform). The response time of these two deployed applications is nearly the same.

Image for post
Image for post

Traditional Java Function vs Statically Compiled Function

GC performance

With the above enhancements we did in GC, we successfully reduced the p90 pause time of a typical Java microservice from 1.5+ seconds to around 100 milliseconds.

Image for post
Image for post

GC time improvements

Conclusion

If you’re exploring ways to develop a serverless application for the cloud it’s worth evaluating GraalVM native image, especially when you’re looking for the best startup performance and lower memory footprint.

We are very pleased with the results in our production environment. We are looking forward to driving innovation through the collaboration with the GraalVM community.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK