2

JEP draft: Primitive types in patterns, instanceof, and switch

 1 year ago
source link: https://openjdk.org/jeps/8288476
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
OwnerAngelos Bimpoudis
TypeFeature
ScopeSE
StatusDraft
Componentspecification / language
Discussionamber dash dev at openjdk dot org
EffortM
DurationM
Reviewed byAlex Buckley
Created2022/06/15 10:05
Updated2023/01/26 13:36
Issue8288476

Summary

Enhance pattern matching by allowing primitive types to appear anywhere in patterns. Extend instanceof to support primitive types, and extend switch to allow primitive constants as case labels. This is a preview language feature.

Goals

  • Provide easy-to-use constructs that eliminate the risk of losing information due to unsafe casts.

  • Enable uniform data exploration by allowing type patterns to match values of any type (primitive or reference).

  • Allow pattern matching to consistently produce values at sharper types than asking developers to use custom conversion logic at most pattern sites; prior restrictions on primitive types in type patterns are dropped.

  • Following the enhancements to switch in Java 5 (enum switch) and Java 7 (string switch), allow switch to process values of any primitive type (primitive switch).

Non-Goals

  • It is not a goal to create any new type conversions or any new conversion contexts.

Motivation

Java developers deal with primitive types all the time, and frequently need to convert from one primitive type to another. Java will freely convert an int to a long, or box an int to an Integer as needed, without requiring a cast; in some cases, such as when dealing with compile time constants, it will even do the reverse when it is safe to do so, such as applying the reverse conversion of a small constant of type int to type byte: byte b = 42. In general, however, Java will not convert automatically because it does not know if a value of one primitive type can be represented by another primitive type. Accordingly, developers must convert manually by inserting a cast: byte b = (byte) i; for an int variable i. Java supports a rich matrix of these so-called casting conversions between primitive types (and their boxes).

Unfortunately, many casting conversions are unsafe: they can lose information about magnitude and sign. For example, if the int variable i holds 1000 then the value of (byte) i is -24. Developers must safeguard their casts by checking, for example, that a 32-bit int can be represented by an 8-bit byte:

int i = ...;
if (i >= -128 && i <= 127) {
    byte b = (byte) i;
    ... b ...
}

Even where casts are not required because Java converts automatically, loss of information about precision and range can occur. For example, passing an int variable i to a method that takes a float can be problematic: both the int value 2^24 (16777216) and the int value 2^24 + 1 (16777217) will yield the same float value (1.6777216E7). Unfortunately, safeguarding against such a subtle loss of precision can be complicated.

Static analysis tools can help to remind developers where a check is needed before a cast (to avoid loss of magnitude or sign), but generally will not warn about automatic conversions (even if they lose precision or range). Even with tool support, it is best if developers can reason about the safety of conversions directly in and from the code. The instanceof, type comparison operator, makes this possible for conversions involving reference types: if instanceof succeeds, then casting to the reference type will definitely succeed, and the resulting object will definitely be non-null:

Object o = ...
if (o instanceof String) {
    String s = (String) o;
    ... s.isEmpty() ...  // will execute without error
}

Despite having been restricted to reference types, instanceof is in principle about asking whether an upcoming cast of a value to a type would succeed without loss of information or error. When instanceof returns true, the program has gained information: a value can be safely cast and the program knows a sharper type for that value than previously known. It would be ideal to remove the restrictions from instanceof and extend those safeguarding and sharpening semantics to conversions involving primitive types as well. instanceof for a primitive type would succeed if a conversion exists and can be performed without loss of magnitude, sign, precision, or range, thus defending against lossy casts between primitive types. For example:

  • to safeguard a casting conversion from int to byte, instanceof would return true for i instanceof byte only if there would be no loss of range or sign converting the value of i to byte by casting or by calling byteValue(). If i is 1000 then instanceof would return false.

  • to safeguard a casting conversion from int to float, instanceof would return true for i instanceof float only if there would be no loss of precision converting the value of i to float by casting or by calling floatValue(). If i is 2^24+1 then instanceof would return false.

  • to safeguard a casting conversion from float to int, instanceof would return true for f instanceof int only if there would be no loss of range, sign, or precision converting the value of f to int by casting or by calling intValue(). If f is 1.0e10 or -0.0 or 0.5 then instanceof would return false.

In effect, instanceof would be meaningful for all types -- reference and primitive -- and thus it could uniformly defend against loss of information or error for any of Java's supported conversions.

Turning to pattern matching, a type pattern in instanceof allows the safety check and the subsequent cast to be fused into a single operation with superior readability. Initially (Java 16), a type pattern in instanceof was restricted to only mention a reference type: o instanceof Person p. Once instanceof supports primitive types, it would be natural to allow a type pattern at a top-level position in instanceof to mention a primitive type as well, following the same meaning of the instanceof type comparison operator. Then, instead of checking i instanceof byte and casting i to byte, a type pattern could be used:

int i = ...;
if (i instanceof byte b) { ... b ... }

Allowing primitive types in type patterns would also improve pattern matching in switch. Here is a simple example, where the first case would apply if the value of i matches the type pattern byte b, that is, can be safely converted to byte:

int i = ...;
switch (i) {
    case byte  b -> ... b ...;
    case float f -> ... f ...;
    default      -> -1;
}

More sophisticated examples involve the use of type patterns in record patterns, which work with record classes to streamline data processing in Java. Part of the productivity boost from record patterns arises from the conversions that are performed automatically when a component of a record class has a reference type. For example, given the record class Pet below, the instantiation new Pet(new Dog()) automatically applies a widening conversion for the argument (Dog to Animal), while the record pattern Pet(Dog d) automatically attempts a narrowing conversion when extracting the component (Animal to Dog). If the narrowing conversion would be unsafe -- that is, the component's type at run-time is Animal but not Dog -- then the record pattern would not match, and the switch would move on to Pet(Animal a):

abstract class Animal {}
class Dog extends Animal {}
record Pet(Animal animal) {} 
...
Pet p = new Pet(new Dog()); // automatic widening conversion from Dog to Animal
switch (p) {
    case Pet(Dog d)    -> ... d ...
    case Pet(Animal a) -> ... a ...
    default            -> ...
}

Prior to this JEP, type patterns in a record pattern component could declare a primitive type. However, such type patterns were highly restrictive. A long l type pattern could only be used against match target of static type long. Dropping this restriction and allowing primitive types in type patterns to follow the meaning of the instanceof operator will enable developers to benefit from automatic conversions when a component of a record class has a primitive type. Because of the current restrictive nature of type patterns we could not express the same query involving primitive type patterns. For example, given the record class ID below, using the record pattern ID(int i) would result in a compile-time error, breaking the symmetry with the previous example. Effectively, creating an instance of ID(42), thus widening an int to a long during construction, is not something reversible at run time as in the previous example. As a result a user cannot retrieve the most specific type int:

record ID(long num) {}
...
ID x = new ID(42); // automatic widening conversion from int to long
switch (x) {
  case ID(int i)  -> ... i ...
  case ID(long x) -> ... x ...
  default         -> ...
}

Once a type pattern used in switch can include a primitive type, it would make sense for switch itself to accept an expression of any primitive type, not just the traditional int, char, short, and byte. Consequently, it would make sense for case labels to give constant expressions of any primitive type, including float, double, long, and boolean. For example:

float f = ...;
switch (f) {
    case 1.0f    -> ...
    case 1.5f    -> ...
    case float g -> ... g ...
}
long x = ...;
switch (x) {
    case 10_000_000_000L -> ...
    case 20_000_000_000L -> ...
    default -> ...
}

Boolean switch would be a useful alternative to the conditional operator (?:) when making inline decisions. Unlike the conditional operator, a boolean switch expression can contain both expressions and statements in its true and false arms. For example, in the method call below, the second argument uses a boolean switch to encapsulate some business logic:

startProcessing(OrderStatus.NEW, switch (user.isLoggedIn()) {
    case true  -> user.id();
    case false -> { log("Unrecognized user"); yield -1; }
});

It would be ideal if the primitive-supporting switch could automatically perform reasonable conversions between the type of its expression and the types of its case labels. For example, if the expression is of type float, then the case labels could be of type float, double, int, or long. However, the loss of precision and range that can occur with other automatic conversions is best avoided. In the following example, switch accepts a float but its case labels are integral values that (as described earlier) convert to the same float value; in other words, the cases are indistinguishable at run time, and the code would be rejected.

float f = ...;
switch (f) {
    case 16_777_216 -> ...
    case 16_777_217 -> ...
    default -> ...
}

In summary, primitive types in instanceof, and in type patterns for instanceof and switch, would increase program reliability and enable more uniform data exploration with pattern matching. This JEP removes the following restrictions:

  • instanceof was restricted to reference types only,
  • primitive type patterns were only allowed in a nested context and not at top-level,
  • primitive type patterns could only be used on a match target of the exact same type and,
  • switch and constant case labels were restricted to support only a subset of primitive types.

Description

Primitive Types in instanceof

As of Java 16, the instanceof operator is either a type comparison operator or a pattern match operator, depending on its syntactic form.

When instanceof is a type comparison operator, support for primitive types is realized by removing the restrictions that (1) the type of the left-hand operand must be a reference type, and (2) the right-hand operand must name a reference type. The form of a type comparison operator becomes:

InstanceofExpression:
    RelationalExpression instanceof Type
    ...

Prior to this JEP, the result of a type comparison operator was false if the value was the null reference, true if the value could be cast to the right-hand operand without raising a ClassCastException, and false otherwise. This JEP generalizes an expression e instanceof T as if asking whether a value e of static type S can be converted to the given primitive or reference type T in a casting context (JLS 5.5) without error or loss of information. This makes instanceof the precondition test for safe casting in general.

Under this generalization, the instanceof type comparison operator is defined to work for all pairs of types that are allowed to be converted in a casting context. Prior to this JEP, pairs between reference types that are not supported, a compile-time error occurs. Under this JEP, type checking instanceof continues to follow the rules of cast conversions and for pairs between both reference and primitive types that are not supported, a compile-time error occurs. The examples given earlier rely on conversions allowed in a casting context, so they can be rewritten to use instanceof directly:

int i = 1000;
if (i instanceof byte) {     // false
  byte b = (byte) i;
  ... b ...
}

byte b = 42;
if (b instanceof int) {      // true
  int i = (byte) b;
  ... i ...
}

int i = 16_777_216;          // 2^24
if (i instanceof float) {    // true
  float f = (float) i;
  ... f ...
}

int i = 16_777_217;          // 2^24+1
if (i instanceof float) {    // false
  float f = (float) i;
  ... f ...
}

This JEP does not add any conversions to the casting context, nor creates any new conversion contexts. Whether instanceof is applicable to a given expression and type is determined entirely by whether there is already a conversion allowed by the casting context. The conversions permitted in casting context are as follows:

  • identity conversions (JLS 5.1.1)
  • widening primitive conversions (JLS 5.1.2)
  • narrowing primitive conversions (JLS 5.1.3)
  • widening and narrowing primitive conversions (JLS 5.1.4)
  • boxing conversions (JLS 5.1.7)
  • unboxing conversions (JLS 5.1.8)

and specified combinations of these:

  • an identity conversion (JLS 5.1.1)
  • a widening reference conversion (JLS 5.1.5)
  • a widening reference conversion followed by an unboxing conversion
  • a widening reference conversion followed by an unboxing conversion, then followed by a widening primitive conversion
  • a narrowing reference conversion (JLS 5.1.6)
  • a narrowing reference conversion followed by an unboxing conversion
  • an unboxing conversion (JLS 5.1.8)
  • an unboxing conversion followed by a widening primitive conversion

The following tables present all the pairs where instanceof is defined. This JEP does not propose any changes to those tables.

  • When the left-hand operand, is an expression of a primitive type:
To → byte short char int long float double boolean
From ↓
byte -
short -
char -
int -
long -
float -
double -
boolean - - - - - - -
  • When the left-hand operand, is an expression of a reference type:
To → byte short char int long float double boolean
From ↓
Byte - -
Short - - -
Character - - -
Integer - - - -
Long - - - - -
Float - - - - - -
Double - - - - - - -
Boolean - - - - - - -
Object
  • When the right-hand operand, a type T, is a reference type, instanceof is similarly defined as in Table 5.5-B (JLS 5.5):
To → Byte Short Character Integer Long Float Double Boolean Object
From ↓
byte - - - - - - -
short - - - - - - -
... ... ... ... ... ... ... ... ... ...
Byte - - - - - - -
Short - - - - - - -
... ... ... ... ... ... ... ... ... ...
Object

Consider the following examples. All of the following are allowed because the left-hand operand of instanceof, an expression e, can be converted to the specified type in a casting context:

int i = ...
i instanceof byte
i instanceof float

boolean b = ...
b instanceof Boolean

Short s = ...
s instanceof int
s instanceof long

long l = ...
l instanceof float
l instanceof double

Long ll = ...
ll instanceof float
ll instanceof double

However, all of the following examples raise a compile-time error, since they do not correspond to a pre-existing casting conversion:

boolean b = ...
b instanceof char    // error

Byte bb = ...
bb instanceof char   // error

Integer ii = ...
ii instanceof byte   // error
ii instanceof short  // error

Long ll = ...
ll instanceof int    // error
ll instanceof Float  // error
ll instanceof Double // error

If e has a reference type and the relational expression is null, instanceof continues to evaluate to false.

Exactness of Conversions

A conversion is exact if no loss of information occurs. Whether a conversion is exact depends on the pair of types involved and potentially on the input value:

  • For some pairs, the conversion from the first type to the second type is guaranteed not to lose information for any value, and requires no action at run time. The conversion is said to be unconditionally exact. Examples include int to int and int to long.

  • For other pairs, a run-time test is needed to check whether the value can be converted from the first type to the second type without loss of information. Examples include long to int and int to float -- both of these conversions detect loss of precision by relying to the notion of "representation equivalence" in java.lang.Double.

Adopting the notation from JLS (5.5) the primitive conversions in the following table show which conversions are unconditionally exact with the symbol ɛ. For completeness: - (no conversion allowed), (identity conversion), ω (widening primitive conversion), η (narrowing primitive conversion), ωη (widening and narrowing primitive conversion):

To → byte short char int long float double boolean
From ↓
byte ɛ ωη ɛ ɛ ɛ ɛ -
short η η ɛ ɛ ɛ ɛ -
char η η ɛ ɛ ɛ ɛ -
int η η η ɛ ω ɛ -
long η η η η ω ω -
float η η η η η ɛ -
double η η η η η η -
boolean - - - - - - -

Consider the following examples, the unconditionally exact conversions are marked with (ε), those always return true regardless the value, the rest of the results were obtained via a runtime check:

byte b = 42;
b instanceof int;         // true (ε)

int i = 1000;
i instanceof byte;        // false

int i = 42;
i instanceof byte;        // true

int i = 16_777_217;       // 2^24+1
i  instanceof float;      // false
i  instanceof double;     // true (ε)
i  instanceof Integer;    // true (ε)
i  instanceof Number;     // true (ε)

float f = 1000.0f;       
f instanceof byte;        // false    
f instanceof int;         // true
f instanceof double;      // true (ε)

double d = 1000.0d;
d instanceof byte;        // false
d instanceof int;         // true
d instanceof float;       // true

Integer ii = 1000;
ii instanceof int;        // true
ii instanceof float;      // true
ii instanceof double;     // true

Integer ii = 16_777_217;
ii instanceof float;      // false
ii instanceof double;     // true

Primitive Type Patterns

Type patterns currently do not allow primitive types when they are top-level, only when they appear in a nested pattern list of a record pattern. We lift that restriction, so that primitives types are allowed in top-level as well.

The semantics of primitive type patterns (and reference type patterns on targets of primitive type) are derived from casting conversions.

A type pattern T t is applicable to a target of type U if a U could be cast to T without an unchecked warning.

A type pattern T t is unconditional on a target of type U if all values of U can be exactly cast to T. This includes widening from one reference type to another, widening from one integral type to another, widening from one floating point type to another, widening from byte, short, or char to a floating point type, widening int to double, and boxing.

A set of patterns containing a type pattern T t is exhaustive on a target of type U if T t is unconditional on U or if there is an unboxing conversion from T to U.

A type pattern T t dominates a type pattern U u, or a record pattern U(...), if T t would be unconditional on a target of type U.

A type pattern T t that does not resolve to an any pattern matches a target u if u instanceof T.

With pattern labels involving record patterns, some patterns are allowed to be exhaustive even when they are not unconditional. For example, the following switch is considered exhaustive on Box<Box<String>>, even though it will not match new Box(null):

Box<Box<String>> bbs = ...
switch (bbs) {
    case Box(Box(String s)): ...
}

The pathological value new Box(null) is considered "remainder", and is handled by a synthetic default clause that throws MatchException. Unboxing follows the same philosophy, being allowed even when there are pathological values that cannot be converted (a null boxed value), because it would be burdensome to require a null check every time we want to unbox. Similarly, novel subtypes (those not known at compile time) of sealed types are considered "remainder" at runtime. This accommodation is made because requiring users to specify all possible combinations of pathological values would be tedious and impractical.

Analogously, a type pattern int x is considered exhaustive on Integer, so the following switch is considered exhaustive on Box<Integer> for the same reason:

Box<Integer> bi = ...
switch (bi) {
    case Box(int i): ...
}

Constant Expressions in case labels

Turning to constant expressions in the case labels of a switch, the primitive types long, float, double, boolean, and their boxes can be associated with a switch block as long as the type of the selector expression (which can be a primitive type or a boxed reference type) is the same as the type of the constant expression.

For example, the constant expression 0f can only be used when the selector expression's type is float or Float:

float f = ...
switch (f) {
    case 0f -> 5f + 0f;
    case Float fi when fi == 1f -> 6f + fi;
    case Float fi -> 7f + fi;
}

Two floating-point numbers are the same per IEEE 754 if their finite values, the sign, exponent, and significand components of the floating-point values are the same. For that reason, representation equivalence defines how switch labels can be selected in the presence of non-integral or boolean values. The same definition is used to signal duplicate label errors in case a developer writes the following switch:

float f = ...
switch (f) {
    case 1.0f -> ...
    case 0.999999999f -> ...
    default -> ...
}

While 1.0f is represented as a float, 0.999999999f is not. The latter is rounded up to 1.0f as well, a situation that results in a compile-time error.

Since boolean (and its box) consist of only two distinct values, a switch that lists both the true and false cases is considered exhaustive:

boolean b = ...
switch (b) {
  case true -> ...
  case false -> ...
  // Alternatively: case true, false -> ...
}

It is a compile-time error for that switch to include a default clause.

Risks and Assumptions

Outside pattern matching and instanceof, lossy assignment is endemic in Java source code. For example if a method returns int then its result can be assigned to a float variable without casting:

int getSalary() { ... }
float salary = getSalary();

The risk is that Java developers do not realize the possible loss of range that can occur at this assignment, because it is silent.

We assume that developers of static analysis tools will realize the new role of instanceof, and avoid flagging code that uses converted data without a prior manual range-check while at the same time they are safeguarded by the extended instanceof.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK