

JEP draft: Value Objects (Preview)
source link: https://openjdk.org/jeps/8277163
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Owner | Dan Smith |
Type | Feature |
Scope | SE |
Status | Submitted |
Component | specification |
Discussion | valhalla dash dev at openjdk dot java dot net |
Effort | XL |
Duration | XL |
Relates to | JEP 401: Primitive Classes (Preview) |
Reviewed by | Brian Goetz |
Created | 2021/11/16 00:14 |
Updated | 2022/10/04 23:20 |
Issue | 8277163 |
Summary
Enhance the Java object model with value objects, class instances that have
only final
instance fields and lack object identity. This is a
preview language and VM feature.
Goals
This JEP provides for the declaration of identity-free value classes and specifying the behavior of their instances, called value objects, with respect to equality, synchronization, and other operations that traditionally depend upon identity.
At runtime, the HotSpot JVM will prefer inlining value objects where feasible, in particular for JIT-compiled method calls and local operations. An inlined value object is encoded directly with its field values, avoiding any overhead from object headers, indirections, or heap allocation.
Non-Goals
Value class types are reference types. The Valhalla project is also developing user-defined primitive types, but these will require additional changes to the Java object model and type system. See "Dependencies" for details.
Existing value-based classes in the standard libraries will not be affected by this JEP. Once the features of this JEP become final, those classes will be available for migration to value classes as a separate task.
Motivation
Java's objects and classes offer powerful abstractions for representing data, including fields, methods, constructors, access control, and nominal subtyping. Every object also comes with identity, enabling features such as field mutation and locking.
Many classes don't take advantage of all of these features. In particular, a significant subset of classes don't have any use for identity—their field values can be permanently set on instantiation, their instances don't need to act as synchronization locks, and their preferred notion of equality makes no distinction between separately-allocated instances with matching field values.
At runtime, support for identity can be expensive. It generally requires that an object's data be located at a particular memory location, packaged with metadata to support the full range of object functionality. Fields are accessed with memory loads, which are relatively slow operations. As objects are shared between program components, data structures and garbage collectors end up with tangled, non-local webs of objects created at different times. Sometimes, JVM implementations can optimize around these constraints, but the resulting performance improvements can be unpredictable.
An alternative is to encode program data with primitive types. Primitive values
don't have identity, and so can be copied freely and encoded as compact bit
sequences. But programs that represent their data with primitive types give up
all the other abstractions provided by objects and classes. (For example, if a
geographic location is encoded as two float
s, there's no way to restrict the
valid range of values, keep matching pairs of float
s together, prevent
re-interpreting the values with the wrong units, or compatibly switch to a
double
-based encoding.)
Value classes provide programmers with a mechanism to opt out of object identity, and in return get many of the performance benefits of primitive types, without giving up the other features of Java classes.
Opting out of identity is an important step towards user-defined primitives, which would fully combine the performance profile of today's primitives with the abstractions of class declarations. JEP 401 will support such types.
However, many classes will be better served by declaring themselves value
classes, carrying on with familiar (and compatible) reference type semantics,
and still unlocking many of the same JVM optimizations. This includes many JDK
classes, like LocalDate
, that are currently designated as "value-based" to
discourage users from relying on their instances' identities.
Description
The features described below are preview features, enabled with the
--enable-preview
compile-time and runtime flags.
Overview
A value object is a class instance that does not have identity. That is, a
value object does not have any particular memory address or any other property
to distinguish it from other instances of the same class whose fields have the
same values.
Value objects cannot mutate their fields or be used for synchronization.
The ==
operator on value objects compares their fields.
A value class declaration introduces a class whose instances are value objects.
An identity object is a class instance or array that does have identity—the
traditional behavior of objects in Java.
An identity object can mutate its non-final
fields and is associated with a
synchronization monitor.
The ==
operator on identity objects compares their identities.
An identity class declaration—the default for a concrete class—introduces a
class whose instances are identity objects.
Value class declarations
A concrete class can be declared a value class with the value
contextual
keyword.
value class Substring implements CharSequence {
private String str;
private int start;
private int end;
public Substring(String str, int start, int end) {
checkBounds(start, end, str.length());
this.str = str;
this.start = start;
this.end = end;
}
public int length() {
return end - start;
}
public char charAt(int i) {
checkBounds(0, i, length());
return str.charAt(start + i);
}
public Substring subSequence(int s, int e) {
checkBounds(s, e, length());
return new Substring(str, start + s, start + e);
}
public String toString() {
return str.substring(start, end);
}
private static void checkBounds(int start, int end, int length) {
if (start < 0 || end < start || length < end)
throw new IndexOutOfBoundsException();
}
}
A concrete value
class declaration is subject to the following restrictions:
-
The class is implicitly
final
, so cannot be extended. -
All instance fields are implicitly
final
, so must be assigned exactly once by constructors or initializers, and cannot be assigned outside of a constructor or initializer. -
The class does not extend an
identity
class or anidentity
interface (see below). -
No constructor makes a
super
constructor call. Instance creation will occur without executing any superclass initialization code. -
No instance methods are declared
synchronized
. -
(Possibly) The class does not declare a
finalize()
method. -
(Possibly) The constructor does not make use of
this
except to set the fields in the constructor body, or perhaps after all fields are definitely assigned.
In most other ways, a value class declaration is just like an identity
class declaration. It implicitly extends Object
if it has no explicit
superclass type. It can be an inner class. It can declare superinterfaces, type
parameters, member classes and interfaces, overloaded constructors, static
members, and the full range of access restrictions on its members.
A concrete class can be declared an identity class with the identity
contextual keyword. In the absence of the value
and identity
modifiers, a
concrete class (other than Object
) is implicitly an identity
class.
identity class Id1 {
int counter = 0;
void increment() { counter++; }
}
class Id2 { // implicitly 'identity'
synchronized void m() {}
}
The value
and identity
modifiers are supported by record classes. Records
are often good candidates to be value classes, because their fields are already
required to be final
.
value record Name(String first, String last) {
public String full() {
return "%s %s".formatted(first, last);
}
}
identity record Node(String label, Node next) {
public String list() {
return label + (next == null) ? "" : ", " + next.list();
}
}
Just like regular classes, identity
is the default modifier for record
classes.
Working with value objects
Value objects are created and operated on just like normal objects:
Substring s1 = new Substring("abc", 0, 2);
Substring s2 = null;
if (s1.length() == 2)
s2 = s1.subSequence(1, 2);
CharSequence cs = s2;
System.out.println(cs.toString()); // prints "b"
The ==
operator compares value objects of the same class in terms of their
field values, not object identity. Fields with basic primitive types are compared
by their bit patterns. Other field values—both identity and value objects—are
recursively compared with ==
.
assert new Substring("abc", 1, 2) == s2;
assert new Substring("abcd", 1, 2) != s2;
assert s1.subSequence(0, 2) == s1;
The equals
, hashCode
, and toString
methods, if inherited from Object
,
along with System.identityHashCode
, behave consistently with this definition
of equality.
Substring s3 = s1.subSequence(0, 2);
assert s1.equals(s3);
assert s1.hashCode() == s3.hashCode();
assert System.identityHashCode(s1) == System.identityHashCode(s3);
The compiler disallows synchronization on any value class type. Attempting to synchronize on a value object at run time results in an exception.
Object obj = s1;
try { synchronized (obj) { } }
catch (IllegalMonitorStateException e) { /* expected exception */ }
Interfaces and Abstract Classes
By default, an interface may be implemented by both value classes and identity
classes. In a special case where the interface is only meant for one kind of
class or the other, the value
or identity
modifier can be used to declare
a value interface or an identity interface.
value interface JsonValue {
String toJsonString();
}
identity interface Counter {
int currentValue();
void increment();
}
It is an error for a value
class or interface to extend an identity
class or interface, or vice versa. This applies to both direct and indirect
superclasses and superinterfaces—e.g., an interface with no modifiers
may extend an identity
interface, but still its implementing classes must
not be value
classes.
Similarly, it is an error for any class or interface to implement,
either directly or indirectly, both a value
superclass or superinterface and
an identity
superclass or superinterface.
(To be a functional interface, compatible with lambda expressions, an
interface must not be or extend a value
interface nor an identity
interface.
This allows for flexibility in the implementation of lambda expressions.)
An abstract class can similarly be extended by both value classes and identity
classes by default, or can use the identity
or value
modifier to restrict
its subclasses. In addition, an abstract class that makes use of any of the
following features is implicitly an identity
class:
- It declares an instance field
- It is an inner class with an enclosing instance
- It declares a
synchronized
method - It declares a non-empty constructor (with a signature or body that differs from the default constructor)
- It declares an instance initializer
With the exception of field declarations, any of these conditions should cause a
compiler warning, encouraging the author to add an explicit identity
modifier.
(The initialization restrictions are necessary because, as noted above, value objects are created without an opportunity to execute any superclass initialization code.)
The class Object
is special. Despite being a concrete class, it is not an
identity class and supports both identity
and value
subclasses. However,
calls to new Object()
continue to create direct identity object instances of
the class (suitable, e.g., as synchronization locks).
Migration of existing classes
If an existing concrete class does not expose its constructors to separately-compiled code, and meets the other requirements of value class declarations, it may be declared as a value class without breaking binary compatibility.
There are some behavioral changes that users of the class may notice:
-
The
==
operator may treat two instances as the same, where previously they were considered different -
Attempts to synchronize on an instance will fail, either at compile time or run time
-
The results of
toString
,equals
, andhashCode
, if they haven't been overridden, may be different -
Assumptions about unique ownership of an instance may be violated (for example, an identical instance may be created at two different program points)
-
Performance will generally improve, but may have different characteristics that are surprising
Some classes in the standard library are designated value-based, and can be expected to become value classes in a future release.
Developers are encouraged to identify and migrate value class candidates in their own code, where appropriate.
class
file representation & interpretation
The identity
and value
modifiers are encoded in a class
file using the
ACC_IDENTITY
(0x0020
) and ACC_VALUE
(0x0040
) flags. In older-versioned
class
files, ACC_IDENTITY
is considered to be set in classes and unset in
interfaces.
(Historically, 0x0020
represented ACC_SUPER
, and all classes, but not
interfaces, were encouraged to set it. The flag is no longer meaningful, but
coincidentally will tend to match this implicit behavior.)
Format checking ensures that identity
and value
are not both set, and that
every class (not interface) has at least one of identity
, value
, or
abstract
set.
Format checking fails if a value
class is not final
, has a non-final
instance field, has a synchronized
instance method, or declares an <init>
method. Similarly, format checking fails if a non-identity
abstract
class
has any instance field or a synchronized
instance method.
(An abstract
class that is neither identity
nor value
may declare an
<init>
method. The code will be executed as usual for identity object
instances, but not for value object instances.)
At class load time, superclasses and superinterfaces are checked for conflicting
identity
or value
modifiers; if a conflict is detected, the class fails to
load.
A value class's type is represented using the usual L
descriptor
(LSubstring;
). To facilitate inlining optimizations, a Preload
attribute can
be provided by any class, communicating to the JVM that a set of referenced
CONSTANT_Class
entries should be eagerly loaded to locate potentially-useful
layout information.
Preload_attribute {
u2 attribute_name_index;
u4 attribute_length;
u2 number_of_classes;
u2 classes[number_of_classes];
}
Two new opcodes facilitate instance creation:
-
aconst_init
, with aCONSTANT_Class
operand, produces an initial instance of the named value class, with all fields set to their default values. This operation always hasprivate
access: a linkage error occurs if anyone other than the value class or its nestmates attempts anaconst_init
operation. -
withfield
, with aCONSTANT_Fieldref
operand, produces a new value object by using an existing object as a template but replacing the value of one of its fields. This operation also hasprivate
access.
It is a linkage error to use the opcode new
with a value class.
A new kind of special method, an value class instance creation method, can be
declared in a concrete value class to produce class instances. These methods are
named <vnew>
and are static
. Their return type must match the type of the
declaring class. They are invoked with invokestatic
.
The if_acmpeq
and if_acmpne
operations implement the ==
test for value
objects, as described above. The monitorenter
instruction throws an exception
if applied to a value object.
Java language compilation
Each class file generated by javac
includes a Preload
attribute naming any
concrete value class that appears in one of the class file's declared field or
method descriptors.
Constructors of value classes compile to value class instance creation methods,
not instance initialization methods. In the constructor body, the compiler
treats this
as a mutable local variable, initialized by aconst_init
,
modified by withfield
, and ultimately returned as the method result.
API & tool support
A new preview API method, Object.isValueObject
, indicates whether an object
is a value object or an identity object. It always returns false
for arrays
and direct instances of the class Object
.
java.lang.reflect.Modifier
adds support for the identity
and value
flags;
these are also exposed via new isIdentity
and isValue
methods in java.lang.Class
.
The method Class.getDeclaredConstructors
, and related methods, search for
value class instance creation methods rather than instance initialization
methods when invoked on a value class.
java.lang.ref
recognizes value objects and treats them specially (details
TBD).
java.lang.invoke
provides a mechanism to execute the aconst_init
and
withfield
instructions reflectively. The LambdaMetafactory
class rejects
identity
and value
superinterfaces.
javax.lang.model
supports the identity
and value
modifiers.
The javadoc
tool surfaces the identity
and value
modifiers.
Performance model
Because value objects lack identity, JVMs may freely duplicate and re-encode them in an effort to improve computation time, memory footprint, and garbage collector performance.
Implementations are free to use different encodings in different contexts, such
as stack vs. heap, as long as the values of the objects' fields are preserved.
However, these encodings must account for the possibility of a null
value, and
must ensure that fields and arrays storing value objects are read and written
atomically.
In practice, this means that local variables, method parameters, and expression results can often use inline encodings, while fields and array components might not be inlined.
Previously, JVMs have used similar optimization techniques to inline identity objects when the JVM is able to prove that an object's identity is never used. Developers can expect more predictable and widespread optimizations for value objects.
HotSpot implementation
This section describes implementation details of this release of the HotSpot virtual machine, for the information of OpenJDK engineers. These details are subject to change in future releases and should not be assumed by users of HotSpot or other JVMs.
Value objects in HotSpot are encoded as follows:
-
In fields and arrays, value objects are encoded as regular heap objects.
-
In the interpreter and C1, value objects on the stack are also encoded as regular heap objects.
-
In C2, value objects on the stack are typically scalarized when stored or passed with concrete value class types. Scalarization effectively encodes each field as a separate variable, with an additional variable encoding
null
; no heap allocation is needed. Methods with value-class-typed parameters support both a pointer-based entry point (for interpreter and C1 calls) and a scalarized entry point (for C2-to-C2 calls). Value objects are allocated on the heap when they need to be viewed as values of a supertype of the value class, or when stored in fields or arrays.
C2 relies on the Preload
attribute to identify value class types at
preparation time. If a value class is not named by Preload
(for example, if
the class was an identity class at compile time), method calls may end up using
a heap object encoding instead. In the case of an overriding mismatch—a method
and its super methods disagree about scalarization of a particular type—the
overriding method may dynamically force callers to de-opt and use the
pointer-based entry point.
To facilitate the special behavior of instructions like if_acmpeq
, value
objects in the heap are identified with a new flag in their object header.
Alternatives
JVMs have long performed escape analysis to identify objects that do not rely on identity throughout their lifespan and can be inlined. These optimizations are somewhat unpredictable, and do not help with objects that escape the scope of the optimization.
Hand-coded optimizations via basic primitive values are possible to improve performance, but as noted in the "Motivation" section, these techniques require giving up valuable abstractions.
The C language and its relatives support inline storage for struct
s and
similar class-like abstractions. For example, the C# language has
value types.
Unlike value objects, instances of these abstractions have identity, meaning
they support operations such as field mutation. As a result, the semantics of
copying on assignment, invocation, etc., must be carefully specified, leading to
a more complex user model and less flexibility for runtime implementations. We
prefer an approach that leaves these low-level details to the discretion of JVM
implementations.
Risks and Assumptions
The feature makes significant changes to the Java object model. Developers may
be surprised by, or encounter bugs due to, changes in the behavior of operations
such as ==
and synchronized
. It will be important to validate that such
disruptions are rare and tractable.
Some changes could potentially affect the performance of identity objects. The
if_acmpeq
instruction, for example, typically only costs one instruction
cycle, but will now need an additional check to detect value objects. The
identity class case should be optimized as the fast path, and we will need to
minimize any performance regressions.
There is a security risk that ==
and hashCode
can indirectly expose private
field values. Further, two large trees of value objects can take unbounded time
to compute ==
, potentially a DoS attack risk. Developers need to understand
these risks.
Dependencies
In anticipation of this feature we already added warnings about potential
incompatible changes to value class candidates in javac
and HotSpot, via
JEP 390.
JEP 401 will expand on value objects by allowing for the declaration
of primitive types. These types support value class features like fields and
methods, and have many of the same semantics. But they do not support null
and
don't guarantee atomic reads and writes; in exchange, they can be more
universally and compactly inlined by JVMs.
JEP 402 will provide class declarations, as allowed by JEP 401, for
the basic primitive types (int
, boolean
, etc.) These declarations will
subsume the existing wrapper classes.
JVM class and method specialization (JEP 218, with revisions) will allow generic classes and methods to specialize field, array, and local variable layouts when parameterized by value class types.
Recommend
-
104
-
75
-
46
-
85
-
10
AuthorsRon Pressler, Alan BatemanOwnerAlan BatemanTypeFeatureScopeSEStatusDraftComponentcore-libsCreated2021/11/15 16:43Updated2021/11/15 20:03Issue8277131Summary Drastic...
-
4
#Java #Java17
-
9
OwnerAlex BuckleyTypeInformationalScopeSEStatusDraftDiscussionjdk dash dev at openjdk dot java dot net
-
6
AuthorsAndrew Haley, Andrew DinnOwnerAndrew HaleyTypeFeatureScopeSEStatusDraftComponent
-
5
OwnerDan SmithTypeFeatureScopeSEStatusDraftDiscussionvalhalla dash dev at openjdk dot java dot net
-
9
AuthorsRon Pressler, Alan BatemanOwnerAlan BatemanTypeFeatureScopeSEStatusSubmittedComponent
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK