1

JEP draft: Value Objects (Preview)

 1 year ago
source link: https://openjdk.org/jeps/8277163
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
OwnerDan Smith
TypeFeature
ScopeSE
StatusSubmitted
Componentspecification
Discussionvalhalla dash dev at openjdk dot java dot net
EffortXL
DurationXL
Relates toJEP 401: Primitive Classes (Preview)
Reviewed byBrian Goetz
Created2021/11/16 00:14
Updated2022/10/04 23:20
Issue8277163

Summary

Enhance the Java object model with value objects, class instances that have only final instance fields and lack object identity. This is a preview language and VM feature.

Goals

This JEP provides for the declaration of identity-free value classes and specifying the behavior of their instances, called value objects, with respect to equality, synchronization, and other operations that traditionally depend upon identity.

At runtime, the HotSpot JVM will prefer inlining value objects where feasible, in particular for JIT-compiled method calls and local operations. An inlined value object is encoded directly with its field values, avoiding any overhead from object headers, indirections, or heap allocation.

Non-Goals

Value class types are reference types. The Valhalla project is also developing user-defined primitive types, but these will require additional changes to the Java object model and type system. See "Dependencies" for details.

Existing value-based classes in the standard libraries will not be affected by this JEP. Once the features of this JEP become final, those classes will be available for migration to value classes as a separate task.

Motivation

Java's objects and classes offer powerful abstractions for representing data, including fields, methods, constructors, access control, and nominal subtyping. Every object also comes with identity, enabling features such as field mutation and locking.

Many classes don't take advantage of all of these features. In particular, a significant subset of classes don't have any use for identity—their field values can be permanently set on instantiation, their instances don't need to act as synchronization locks, and their preferred notion of equality makes no distinction between separately-allocated instances with matching field values.

At runtime, support for identity can be expensive. It generally requires that an object's data be located at a particular memory location, packaged with metadata to support the full range of object functionality. Fields are accessed with memory loads, which are relatively slow operations. As objects are shared between program components, data structures and garbage collectors end up with tangled, non-local webs of objects created at different times. Sometimes, JVM implementations can optimize around these constraints, but the resulting performance improvements can be unpredictable.

An alternative is to encode program data with primitive types. Primitive values don't have identity, and so can be copied freely and encoded as compact bit sequences. But programs that represent their data with primitive types give up all the other abstractions provided by objects and classes. (For example, if a geographic location is encoded as two floats, there's no way to restrict the valid range of values, keep matching pairs of floats together, prevent re-interpreting the values with the wrong units, or compatibly switch to a double-based encoding.)

Value classes provide programmers with a mechanism to opt out of object identity, and in return get many of the performance benefits of primitive types, without giving up the other features of Java classes.

Opting out of identity is an important step towards user-defined primitives, which would fully combine the performance profile of today's primitives with the abstractions of class declarations. JEP 401 will support such types.

However, many classes will be better served by declaring themselves value classes, carrying on with familiar (and compatible) reference type semantics, and still unlocking many of the same JVM optimizations. This includes many JDK classes, like LocalDate, that are currently designated as "value-based" to discourage users from relying on their instances' identities.

Description

The features described below are preview features, enabled with the --enable-preview compile-time and runtime flags.

Overview

A value object is a class instance that does not have identity. That is, a value object does not have any particular memory address or any other property to distinguish it from other instances of the same class whose fields have the same values. Value objects cannot mutate their fields or be used for synchronization. The == operator on value objects compares their fields. A value class declaration introduces a class whose instances are value objects.

An identity object is a class instance or array that does have identity—the traditional behavior of objects in Java. An identity object can mutate its non-final fields and is associated with a synchronization monitor. The == operator on identity objects compares their identities. An identity class declaration—the default for a concrete class—introduces a class whose instances are identity objects.

Value class declarations

A concrete class can be declared a value class with the value contextual keyword.

value class Substring implements CharSequence {
    private String str;
    private int start;
    private int end;
    
    public Substring(String str, int start, int end) {
        checkBounds(start, end, str.length());
        this.str = str;
        this.start = start;
        this.end = end;
    }
    
    public int length() {
        return end - start;
    }
    
    public char charAt(int i) {
        checkBounds(0, i, length());
        return str.charAt(start + i);
    }
    
    public Substring subSequence(int s, int e) {
        checkBounds(s, e, length());
        return new Substring(str, start + s, start + e);
    }
    
    public String toString() {
        return str.substring(start, end);
    }
    
    private static void checkBounds(int start, int end, int length) {
        if (start < 0 || end < start || length < end)
            throw new IndexOutOfBoundsException();
    }
}

A concrete value class declaration is subject to the following restrictions:

  • The class is implicitly final, so cannot be extended.

  • All instance fields are implicitly final, so must be assigned exactly once by constructors or initializers, and cannot be assigned outside of a constructor or initializer.

  • The class does not extend an identity class or an identity interface (see below).

  • No constructor makes a super constructor call. Instance creation will occur without executing any superclass initialization code.

  • No instance methods are declared synchronized.

  • (Possibly) The class does not declare a finalize() method.

  • (Possibly) The constructor does not make use of this except to set the fields in the constructor body, or perhaps after all fields are definitely assigned.

In most other ways, a value class declaration is just like an identity class declaration. It implicitly extends Object if it has no explicit superclass type. It can be an inner class. It can declare superinterfaces, type parameters, member classes and interfaces, overloaded constructors, static members, and the full range of access restrictions on its members.

A concrete class can be declared an identity class with the identity contextual keyword. In the absence of the value and identity modifiers, a concrete class (other than Object) is implicitly an identity class.

identity class Id1 {
    int counter = 0;
    void increment() { counter++; }
}

class Id2 { // implicitly 'identity'
    synchronized void m() {}
}

The value and identity modifiers are supported by record classes. Records are often good candidates to be value classes, because their fields are already required to be final.

value record Name(String first, String last) {
    public String full() {
        return "%s %s".formatted(first, last);
    }
}

identity record Node(String label, Node next) {
    public String list() {
        return label + (next == null) ? "" : ", " + next.list();
    }
}

Just like regular classes, identity is the default modifier for record classes.

Working with value objects

Value objects are created and operated on just like normal objects:

Substring s1 = new Substring("abc", 0, 2);
Substring s2 = null;
if (s1.length() == 2)
    s2 = s1.subSequence(1, 2);
CharSequence cs = s2;
System.out.println(cs.toString()); // prints "b"

The == operator compares value objects of the same class in terms of their field values, not object identity. Fields with basic primitive types are compared by their bit patterns. Other field values—both identity and value objects—are recursively compared with ==.

assert new Substring("abc", 1, 2) == s2;
assert new Substring("abcd", 1, 2) != s2;
assert s1.subSequence(0, 2) == s1;

The equals, hashCode, and toString methods, if inherited from Object, along with System.identityHashCode, behave consistently with this definition of equality.

Substring s3 = s1.subSequence(0, 2);
assert s1.equals(s3);
assert s1.hashCode() == s3.hashCode();
assert System.identityHashCode(s1) == System.identityHashCode(s3);

The compiler disallows synchronization on any value class type. Attempting to synchronize on a value object at run time results in an exception.

Object obj = s1;
try { synchronized (obj) { } }
catch (IllegalMonitorStateException e) { /* expected exception */ }

Interfaces and Abstract Classes

By default, an interface may be implemented by both value classes and identity classes. In a special case where the interface is only meant for one kind of class or the other, the value or identity modifier can be used to declare a value interface or an identity interface.

value interface JsonValue {
    String toJsonString();
}

identity interface Counter {
    int currentValue();
    void increment();
}

It is an error for a value class or interface to extend an identity class or interface, or vice versa. This applies to both direct and indirect superclasses and superinterfaces—e.g., an interface with no modifiers may extend an identity interface, but still its implementing classes must not be value classes.

Similarly, it is an error for any class or interface to implement, either directly or indirectly, both a value superclass or superinterface and an identity superclass or superinterface.

(To be a functional interface, compatible with lambda expressions, an interface must not be or extend a value interface nor an identity interface. This allows for flexibility in the implementation of lambda expressions.)

An abstract class can similarly be extended by both value classes and identity classes by default, or can use the identity or value modifier to restrict its subclasses. In addition, an abstract class that makes use of any of the following features is implicitly an identity class:

  • It declares an instance field
  • It is an inner class with an enclosing instance
  • It declares a synchronized method
  • It declares a non-empty constructor (with a signature or body that differs from the default constructor)
  • It declares an instance initializer

With the exception of field declarations, any of these conditions should cause a compiler warning, encouraging the author to add an explicit identity modifier.

(The initialization restrictions are necessary because, as noted above, value objects are created without an opportunity to execute any superclass initialization code.)

The class Object is special. Despite being a concrete class, it is not an identity class and supports both identity and value subclasses. However, calls to new Object() continue to create direct identity object instances of the class (suitable, e.g., as synchronization locks).

Migration of existing classes

If an existing concrete class does not expose its constructors to separately-compiled code, and meets the other requirements of value class declarations, it may be declared as a value class without breaking binary compatibility.

There are some behavioral changes that users of the class may notice:

  • The == operator may treat two instances as the same, where previously they were considered different

  • Attempts to synchronize on an instance will fail, either at compile time or run time

  • The results of toString, equals, and hashCode, if they haven't been overridden, may be different

  • Assumptions about unique ownership of an instance may be violated (for example, an identical instance may be created at two different program points)

  • Performance will generally improve, but may have different characteristics that are surprising

Some classes in the standard library are designated value-based, and can be expected to become value classes in a future release.

Developers are encouraged to identify and migrate value class candidates in their own code, where appropriate.

class file representation & interpretation

The identity and value modifiers are encoded in a class file using the ACC_IDENTITY (0x0020) and ACC_VALUE (0x0040) flags. In older-versioned class files, ACC_IDENTITY is considered to be set in classes and unset in interfaces.

(Historically, 0x0020 represented ACC_SUPER, and all classes, but not interfaces, were encouraged to set it. The flag is no longer meaningful, but coincidentally will tend to match this implicit behavior.)

Format checking ensures that identity and value are not both set, and that every class (not interface) has at least one of identity, value, or abstract set.

Format checking fails if a value class is not final, has a non-final instance field, has a synchronized instance method, or declares an <init> method. Similarly, format checking fails if a non-identity abstract class has any instance field or a synchronized instance method.

(An abstract class that is neither identity nor value may declare an <init> method. The code will be executed as usual for identity object instances, but not for value object instances.)

At class load time, superclasses and superinterfaces are checked for conflicting identity or value modifiers; if a conflict is detected, the class fails to load.

A value class's type is represented using the usual L descriptor (LSubstring;). To facilitate inlining optimizations, a Preload attribute can be provided by any class, communicating to the JVM that a set of referenced CONSTANT_Class entries should be eagerly loaded to locate potentially-useful layout information.

Preload_attribute {
    u2 attribute_name_index;
    u4 attribute_length;
    u2 number_of_classes;
    u2 classes[number_of_classes];
}

Two new opcodes facilitate instance creation:

  • aconst_init, with a CONSTANT_Class operand, produces an initial instance of the named value class, with all fields set to their default values. This operation always has private access: a linkage error occurs if anyone other than the value class or its nestmates attempts an aconst_init operation.

  • withfield, with a CONSTANT_Fieldref operand, produces a new value object by using an existing object as a template but replacing the value of one of its fields. This operation also has private access.

It is a linkage error to use the opcode new with a value class.

A new kind of special method, an value class instance creation method, can be declared in a concrete value class to produce class instances. These methods are named <vnew> and are static. Their return type must match the type of the declaring class. They are invoked with invokestatic.

The if_acmpeq and if_acmpne operations implement the == test for value objects, as described above. The monitorenter instruction throws an exception if applied to a value object.

Java language compilation

Each class file generated by javac includes a Preload attribute naming any concrete value class that appears in one of the class file's declared field or method descriptors.

Constructors of value classes compile to value class instance creation methods, not instance initialization methods. In the constructor body, the compiler treats this as a mutable local variable, initialized by aconst_init, modified by withfield, and ultimately returned as the method result.

API & tool support

A new preview API method, Object.isValueObject, indicates whether an object is a value object or an identity object. It always returns false for arrays and direct instances of the class Object.

java.lang.reflect.Modifier adds support for the identity and value flags; these are also exposed via new isIdentity and isValue methods in java.lang.Class. The method Class.getDeclaredConstructors, and related methods, search for value class instance creation methods rather than instance initialization methods when invoked on a value class.

java.lang.ref recognizes value objects and treats them specially (details TBD).

java.lang.invoke provides a mechanism to execute the aconst_init and withfield instructions reflectively. The LambdaMetafactory class rejects identity and value superinterfaces.

javax.lang.model supports the identity and value modifiers.

The javadoc tool surfaces the identity and value modifiers.

Performance model

Because value objects lack identity, JVMs may freely duplicate and re-encode them in an effort to improve computation time, memory footprint, and garbage collector performance.

Implementations are free to use different encodings in different contexts, such as stack vs. heap, as long as the values of the objects' fields are preserved. However, these encodings must account for the possibility of a null value, and must ensure that fields and arrays storing value objects are read and written atomically.

In practice, this means that local variables, method parameters, and expression results can often use inline encodings, while fields and array components might not be inlined.

Previously, JVMs have used similar optimization techniques to inline identity objects when the JVM is able to prove that an object's identity is never used. Developers can expect more predictable and widespread optimizations for value objects.

HotSpot implementation

This section describes implementation details of this release of the HotSpot virtual machine, for the information of OpenJDK engineers. These details are subject to change in future releases and should not be assumed by users of HotSpot or other JVMs.

Value objects in HotSpot are encoded as follows:

  • In fields and arrays, value objects are encoded as regular heap objects.

  • In the interpreter and C1, value objects on the stack are also encoded as regular heap objects.

  • In C2, value objects on the stack are typically scalarized when stored or passed with concrete value class types. Scalarization effectively encodes each field as a separate variable, with an additional variable encoding null; no heap allocation is needed. Methods with value-class-typed parameters support both a pointer-based entry point (for interpreter and C1 calls) and a scalarized entry point (for C2-to-C2 calls). Value objects are allocated on the heap when they need to be viewed as values of a supertype of the value class, or when stored in fields or arrays.

C2 relies on the Preload attribute to identify value class types at preparation time. If a value class is not named by Preload (for example, if the class was an identity class at compile time), method calls may end up using a heap object encoding instead. In the case of an overriding mismatch—a method and its super methods disagree about scalarization of a particular type—the overriding method may dynamically force callers to de-opt and use the pointer-based entry point.

To facilitate the special behavior of instructions like if_acmpeq, value objects in the heap are identified with a new flag in their object header.

Alternatives

JVMs have long performed escape analysis to identify objects that do not rely on identity throughout their lifespan and can be inlined. These optimizations are somewhat unpredictable, and do not help with objects that escape the scope of the optimization.

Hand-coded optimizations via basic primitive values are possible to improve performance, but as noted in the "Motivation" section, these techniques require giving up valuable abstractions.

The C language and its relatives support inline storage for structs and similar class-like abstractions. For example, the C# language has value types. Unlike value objects, instances of these abstractions have identity, meaning they support operations such as field mutation. As a result, the semantics of copying on assignment, invocation, etc., must be carefully specified, leading to a more complex user model and less flexibility for runtime implementations. We prefer an approach that leaves these low-level details to the discretion of JVM implementations.

Risks and Assumptions

The feature makes significant changes to the Java object model. Developers may be surprised by, or encounter bugs due to, changes in the behavior of operations such as == and synchronized. It will be important to validate that such disruptions are rare and tractable.

Some changes could potentially affect the performance of identity objects. The if_acmpeq instruction, for example, typically only costs one instruction cycle, but will now need an additional check to detect value objects. The identity class case should be optimized as the fast path, and we will need to minimize any performance regressions.

There is a security risk that == and hashCode can indirectly expose private field values. Further, two large trees of value objects can take unbounded time to compute ==, potentially a DoS attack risk. Developers need to understand these risks.

Dependencies

In anticipation of this feature we already added warnings about potential incompatible changes to value class candidates in javac and HotSpot, via JEP 390.

JEP 401 will expand on value objects by allowing for the declaration of primitive types. These types support value class features like fields and methods, and have many of the same semantics. But they do not support null and don't guarantee atomic reads and writes; in exchange, they can be more universally and compactly inlined by JVMs.

JEP 402 will provide class declarations, as allowed by JEP 401, for the basic primitive types (int, boolean, etc.) These declarations will subsume the existing wrapper classes.

JVM class and method specialization (JEP 218, with revisions) will allow generic classes and methods to specialize field, array, and local variable layouts when parameterized by value class types.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK