

JEP draft: Classfile API
source link: https://openjdk.org/jeps/8280389
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Owner | Brian Goetz |
Type | Feature |
Scope | JDK |
Status | Draft |
Component | core-libs |
Effort | M |
Duration | M |
Created | 2022/01/20 14:51 |
Updated | 2022/06/17 16:34 |
Issue | 8280389 |
Summary
Provide an API for parsing, generating, and transforming Java class files. This will initially serve as an internal replacement for ASM in the JDK, to be later opened as a public API.
Goals
Classfile generation, parsing, and instrumentation is ubiquitous in the Java ecosystem; many tools and libraries need to be able to process class files, and frameworks often perform on-the-fly bytecode instrumentation, transformation, and generation. The JDK should provide an accurate, complete, up-to-date, performant API for reading, writing and transforming Java class files.
We aim to initially replace ASM as a runtime dependency of the JDK, without unacceptable loss of performance. As a stretch goal, it would be desirable to also replace the internal "classreader" library used by the compiler and JDK tools.
Eventually, a wide range of applications and frameworks should be to effectively use this library as an alternative to ASM, cglib, or other bytecode library.
Non-Goals
There are dozens of libraries that process bytecode, each with different design goals, strengths and weaknesses. It is not a goal to render any of these "obsolete", to be the "one bytecode library to rule them all", or to be the "world's fastest bytecode library."
Motivation
There are a number of reasons why it makes sense for Java to provide a classfile library.
JDK consolidation. The JDK itself is a significant dealer in class files.
Largely for historical reasons, the JDK contains at least three classfile
libraries -- a fork of BCEL in a fork of Xalan in java.xml
, a fork of ASM used
by LambdaMetafactory
, jlink
, and others, and the internal classreader
library shared by the compiler and other JDK tools. There is also a
delay inherent in the JDK's use of ASM; the ASM version for JDK N cannot
finalize until JDK N finalizes, which means that javac
cannot generate class
file features that are new in JDK N until JDK N+1 -- because JDK tools such as
jlink
will not be able to process them. JDK developers need a bytecode
library that is kept up-to-date with the JVMS.
Version skew between frameworks and running JDK. Applications and frameworks that process class files generally bundle a classfile library (e.g., ASM, cglib, etc) with their application. But because new class file features can appear in any JDK release, and the rate of JDK releases accelerated substantially after Java 9, applications and frameworks are more frequently encountering class files that are newer than the library they are bundled with, resulting in runtime errors (or worse, frameworks trying to parse class file formats "from the future", and engaging in leaps of faith that nothing too serious has changed.) Applications and framework developers want a classfile library that they can count on to be up-to-date with the running JDK.
JVM evolution. The JVM, and class file format, are evolving much faster now
than in the early years of Java. While some evolutions are simple
(such as adding new attributes such as NestMembers
), others are more complex;
Project Valhalla will bring new bytecodes, new field descriptors, and new
verification rules. At some point, it may be prohibitively expensive or complex
to evolve existing libraries to support these new features.
Language improvements. It is perhaps an "obvious" idea to "just" take ASM into the JDK and take responsibility for its ongoing maintenance. But, this is not the right choice for a native bytecode library in the JDK, for many reasons. It's an old codebase with a lot of legacy baggage; the design priorities that informed its architecture may not be the same as the JDK would want today; it is difficult to evolve; and the language has improved substantially since ASM was written (lambdas, records and sealed classes, pattern matching), meaning that what might have been the best possible API idioms in 2002 may not be ideal two decades later.
Description
We've adopted the following design goals and principles for the API:
Class file entities are represented by immutable objects. All class file entities, such as methods, fields, attributes, instructions, annotations, etc, are represented by immutable objects.
Tree-structured representation. A class file has a tree structure; a class
has some metadata properties (name, supertype, etc), and a variable number of
fields, methods, and attributes. Fields and methods themselves have metadata
properties, and further contain attributes, including the Code
attribute, and
the code attribute further contains instructions, exception handlers, etc. The
API should reflect this tree-structured organization.
User-driven navigation. The path we take through the class file tree is
driven by user choices. If the user cares only about annotations on fields, we
should only have to parse as far down as the annotation attributes inside the
field_info
; we shouldn't have to look into any of the class attributes or the
bodies of methods, or at other attributes of the field. Users should be able to deal with compound entities (such as
methods) either as a single unit or break them into streams of their constituent
parts, as desired.
Laziness. User-driven navigation enables significant efficiencies, such as
not parsing any more of the class file than we need to satisfy the user's needs.
If the user is not going to dive into the contents of a method, then we need not
parse any more of the method_info
structure than is needed to figure out how
where the next class file element starts. We can lazily inflate (and cache) the
full representation when the user has asked for it.
Streaming plus materialized. Like ASM, we want to support both a streaming and materialized view of the class file. The streaming view is suitable for the majority of use cases; the materialized view is more general but (in the case of ASM) more expensive. Unlike ASM, though, we can provide the random access that a materialized view provides, but in a far less expensive way through laziness (enabled by immutability), and we can align the streaming and materialized views so that they use a common vocabulary and that both can be used in coordination, as is convenient for the use case.
Emergent transformation. If the class file reading and writing APIs are sufficiently aligned, then transformation can be an emergent property that does not require its own special mode. (ASM achieves this by using a common Visitor structure between readers and writers.) If classes, methods, fields, and code bodies are readable and writable as streams of elements, then transformation can be viewed as a flat-map operation on this stream, described by lambdas.
Detail hiding. Many parts of the class file (constant pool, bootstrap method table, stack maps, etc) are derived from other parts of the class file. It makes no sense to ask the user to construct these directly; this makes extra work for the user and increases the chance of error. For attributes and class file entities that are tightly coupled to other class file entities, we can let the library generate these based on the methods, fields, and instructions added to the class file.
Elements, builders, and transforms. We construct the API from three
key abstractions. An element is an immutable description of some part
of the classfile, which may be an instruction, attribute, field, method, or an
entire classfile. Some elements, such as methods, are compound elements,
in that in addition to being elements, also contain elements of their own.
These can be dealt with as a whole, or further decomposed. Each kind of
compound element has a corresponding builder, which has both specific
building methods (e.g., ClassBuilder::withMethod
) as well as acting as a
Consumer
of the appropriate element type.
Finally, a transform represents a function that takes an element and a builder
and mediates how (if at all) that element is transformed into other elements.
Lean into the language. In 2002, the Visitor approach used by ASM seemed
clever, and was surely more pleasant to use than what came before. However, Java
has improved tremendously since then; we now have lambdas, records and pattern
matching, and the JDK now has a standardized API for describing class file
constants (java.lang.constant
). We can use these to design an API that is more
flexible and pleasant to use, less verbose, and less error-prone.
Examples
The examples in this section incorporate early sketches of a possible API; they are intended as motivating examples rather than as exposition of the final API.
Reading, with pattern matching
ASM's streaming view of the class file is visitor-based. Visitors are bulky and
inflexible; the Visitor pattern is often characterized as a library workaround
for the lack of pattern matching in the language. Now that Java has pattern
matching, we can express things more directly and concisely. For example, if we
want to traverse a Code
attribute and collect dependencies for a class
dependency graph, we can simply iterate through the instructions and match on
the ones we find interesting:
CodeModel code = ...
HashSet<ClassDesc> deps = new HashSet<>();
for (CodeElement e : code) {
switch (e) {
case FieldAccess f -> deps.add(f.owner());
case Invoke i -> deps.add(i.owner());
// similar for instanceof, cast, etc
}
}
Writing, with builders. Consider the following snippet of Java code:
void fooBar(boolean z, int x) {
if (z)
foo(x);
else
bar(x);
}
In ASM, we'd generate this method as follows:
ClassWriter classWriter = ...
MethodVisitor mv = classWriter.visitMethod(0, "fooBar", "(ZI)V", null, null);
mv.visitCode();
mv.visitVarInsn(ILOAD, 1);
Label label1 = new Label();
mv.visitJumpInsn(IFEQ, label1);
mv.visitVarInsn(ALOAD, 0);
mv.visitVarInsn(ILOAD, 2);
mv.visitMethodInsn(INVOKEVIRTUAL, "Foo", "foo", ClassDesc.ofDescriptor("(I)V"), false);
Label label2 = new Label();
mv.visitJumpInsn(GOTO, label2);
mv.visitLabel(label1);
mv.visitVarInsn(ALOAD, 0);
mv.visitVarInsn(ILOAD, 2);
mv.visitMethodInsn(INVOKEVIRTUAL, "Foo", "bar", ClassDesc.ofDescriptor("(I)V"), false);
mv.visitLabel(label2);
mv.visitInsn(RETURN);
mv.visitEnd();
The MethodVisitor
in ASM doubles as both a visitor and a builder. Clients can
create a ClassWriter
directly, and then can ask the ClassWriter
for a
MethodVisitor
. However, there is value in inverting this API idiom, where,
instead of the client creating a builder with a constructor or factory, it
provides a lambda which accepts a builder.
ClassBuilder classBuilder = ...
classBuilder.withMethod("fooBar", ClassDesc.ofDescriptor("(ZI)V"), flags,
mb -> mb.withCode(codeBuilder -> {
Label label1 = new Label();
Label label2 = new Label();
codeBuilder.iload(1)
.branch(IFEQ, label1)
.aload(0)
.iload(2)
.invokevirtual("Foo", "foo", ClassDesc.ofDescriptor("(I)V"))
.branch(GOTO, label2)
.labelTarget(label1)
.aload(0)
.iload(2)
.invokevirtual("Foo", "bar", ClassDesc.ofDescriptor("(I)V"))
.labelTarget(label2);
.returnFromMethod(VOID);
});
This is more specific and transparent (the builder has lots of conveniences
methods like aload(n)
), but not yet any more concise or higher-level. But
there is already a powerful hidden benefit: by capturing the sequence of
operations in a lambda, we get the possibility of replay, which enables the
library to do work that previously the client had to do. For example, branch
offsets can be short or long. If clients generate instructions imperatively,
they have to know how far the branch offset is at the time each branch is
generated, which is complex and error-prone. But if the client provides a
lambda that takes a builder, the library can optimistically try to generate the
method with short offsets, and if this fails, discard the generated state and
re-invoke the lambda with different code generation parameters.
Decoupling builders from visitation also lets us provide higher-level conveniences to manage block scoping, local variable index calculation, and allowing us to eliminate the manual label management and branching:
CodeBuilder classBuilder = ...
classBuilder.withMethod("fooBar", ClassDesc.ofDescriptor("(ZI)V"), flags,
methodBuilder -> methodBuilder.withCode(codeBuilder -> {
codeBuilder.iload(codeBuilder.parameterSlot(0))
.ifThenElse(
b1 -> b1.aload(codeBuilder.receiverSlot())
.iload(codeBuilder.parameterSlot(1))
.invokevirtual("Foo", "foo", ClassDesc.ofDescriptor("(I)V")),
b2 -> b2.aload(codeBuilder.receiverSlot())
.iload(codeBuilder.parameterSlot(1))
.invokevirtual("Foo", "bar", ClassDesc.ofDescriptor("(I)V"))
.returnFromMethod(VOID);
});
Because the block scoping is managed by the library, we didn't have to generate labels or branch instructions -- the library can insert them for us. Similarly, the library can optionally manage block-scoped allocation of local variables, freeing clients of the bookkeeping for local variable slots as well.
Transformation
The reading and writing APIs must line up so that transformation is seamless.
The reading example above traversed a sequence of CodeElement
, letting clients
match against the individual elements; if we make the builder accept these
CodeElement
s, then typical transformation idioms line up fairly naturally.
If we want to process a class file and keep everything unchanged except for
removing methods whose names start with "debug", we would get a ClassModel
,
create a ClassBuilder
, iterate the elements of the original ClassModel
, and
pass through all of them to the builder, except the methods we want to drop:
ClassModel classModel = ClassModel.of(bytes);
byte[] newBytes = Classfile.build(classModel.flags(), classModel.name(),
classBuilder -> {
for (ClassElement ce : classModel) {
if (!(ce instanceof MethodModel mm
&& mm.nameString().startsWith("debug"))
classBuilder.with(ce);
}
});
Transforming method bodies is slightly more complicated, as we have to explode
classes into their parts (fields, methods, attributes), select the method
elements, explode the method elements into their parts (including the code
attribute), and then explode the code attribute into its elements
(instructions). The following transformation swaps out invocations of methods on
class Foo
to class Bar
:
ClassModel classModel = ClassModel.of(bytes);
byte[] newBytes = ClassBuilder.of(classModel.flags(), classModel.name(),
classBuilder -> {
for (ClassElement ce : classModel) {
if (ce instanceof MethodModel mm) {
classBuilder.withMethod(mm.name(), mm.descriptor(),
mm.flags(), methodBuilder -> {
for (MethodElement me : mm) {
if (me instanceof CodeModel codeModel) {
methodBuilder.withCode(codeBuilder -> {
for (CodeElement e : codeModel) {
switch (e) {
case Invoke i && i.owner().asInternal("Foo")) ->
codeBuilder.invoke(i.opcode(), ClassDesc.of("Bar"), i.name(), i.type());
default -> codeBuilder.with(e);
}
});
}
else
methodBuilder.with(me);
}
});
}
else
classBuilder.with(ce);
}
});
We can see that navigating the classfile tree by exploding entities into elements and examining each element involves some boilerplate which is repeated at multiple levels. This tree-traversal idiom is common to all traversals; this is something the library should help with. The common pattern of taking a classfile entity, obtaining a corresponding builder, examining each element and possibly replacing it with other elements, is facilitated by transformation methods, which let us specify how elements are processed via transforms. A transform accepts a builder and an element, and specifies whether that element is passed back through to the builder, dropped, or replaced with other elements. Transforms are functional interfaces, and so transformation logic can be captured in a lambda. Transformation methods copy the relevant metadata from a composite element (names, flags, etc) to the fresh builder, and then process its elements using a transform, handling the repetitive exploding and iteration code. The "swap Foo methods for Bar methods" example can be rewritten using transformation as:
ClassModel classModel = ClassModel.of(bytes);
byte[] newBytes = classModel.adapt((classBuilder, ce) -> {
if (ce instanceof MethodModel mm) {
classBuilder.adapMethod(mm, (methodBuilder, me)-> {
if (me instanceof CodeModel cm) {
methodBuilder.adaptCode(cm, (codeBuilder, e) -> {
switch (e) {
case Invoke i && i.owner().equals("Foo")) ->
codeBuilder.invoke(i.opcode(), ClassDesc.of("Bar"), i.name(), i.type());
default -> codeBuilder.with(e);
}
});
}
else
methodBuilder.with(me);
});
}
else
classBuilder.with(ce);
});
Now the library is managing the iteration boilerplate, but the deep nesting of
lambdas just to get access to the instructions is still somewhat intimidating.
We can simplify this by factoring out the instruction-specific activity directly
as a CodeTransform
:
CodeTransform codeTransform = (codeBuilder, e) -> {
switch (e) {
case Invoke i && i.owner().equals("Foo")) ->
codeBuilder.invoke(i.opcode(), ClassDesc.of("Bar"), i.name(), i.type());
default -> codeBuilder.accept(e);
}
};
and then we can lift this transform on code elements into a transform on method elements (when it sees a Code attribute, it adapts it with the code transform, and passes all other method elements through unchanged), and we can do the same again to lift the resulting transform on method elements into a transform on class elements:
MethodTransform methodTransform = MethodTransform.adaptingCode(codeTransform);
ClassTransform classTransform = ClassTransform.adaptingMethods(methodTransform);
at which point our example becomes just:
byte[] newBytes = ClassModel.of(bytes).adapt(classTransform);
Alternatives
We could continue to use ASM for bytecode generation and transformation in the
JDK. Alternately, we could move the "classreader" library to java.base
and
use that in preference to ASM.
Testing
As this library will have a large surface area and must generate classes in conformance with the JVMS, significant quality and conformance testing will be required. Further, to the degree that we replace existing uses of ASM with the new library, we will want to be able to effectively compare the results using both generation techniques to detect regressions, and do extensive performance testing to detect and avoid performance regressions.
jep11: https://openjdk.java.net/jeps/11
Recommend
-
104
-
74
-
46
-
85
-
9
AuthorsRon Pressler, Alan BatemanOwnerAlan BatemanTypeFeatureScopeSEStatusDraftComponentcore-libsCreated2021/11/15 16:43Updated2021/11/15 20:03Issue8277131Summary Drastic...
-
5
Java News Roundup: Classfile API Draft, Spring Boot, GlassFish, Project Reactor, Micronaut Jun 27, 2022...
-
6
OwnerRoman KennkeTypeFeatureScopeImplementationStatusDraftComponenthotspot / runtimeDiscussio...
-
8
OwnerDan SmithTypeFeatureScopeSEStatusSubmittedComponentspecificationDiscussionvalha...
-
9
OwnerAlex BuckleyTypeInformationalScopeSEStatusDraftDiscussionjdk dash dev at openjdk dot java dot net
-
4
OwnerAngelos BimpoudisTypeFeatureScopeSEStatusDraftComponentspecification / languageDiscussio...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK