17

[RFC] Match expression by iluuu1994 · Pull Request #5371 · php/php-src · GitHub

 4 years ago
source link: https://github.com/php/php-src/pull/5371
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Join GitHub today

GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.

Sign up

[RFC] Match expression #5371

Conversation

Member

carusogabriel left a comment

If I want to write return match($x), with no space between the constructor and the list of arguments, does the current grammar proposal support it?

Contributor

Author

iluuu1994 commented on Apr 12

@carusogabriel Yep. Normal whitespace rules apply.

Contributor

TysonAndre commented on Apr 13

EDIT: Also, I guess some optimizations for ZEND_SWITCH_STRING and ZEND_SWITCH_LONG would be incorrect for match, e.g. potential of assuming that loose equality means other branches are dead code.

Because there are opcache bugs, that'll need to be addressed, almost definitely by introducing a new opcode such as ZEND_MATCH, possibly along with ZEND_MATCH_STRING, ZEND_MATCH_LONG, as you mentioned in the TODO.

On a php-src build with this PR (and a few other proposed changes)

<?php

// Observed:
// With opcache.enable_cli=0, this correctly throws an UnhandledMatchError
// With opcache.enable_cli=1, this incorrectly throws a RuntimeException
function test() {
    $x = '2';
    match($x) {
        1, 2, 3, 4, 5 => { throw new RuntimeException(); },
    };
}
test();

Opcache sees that '2' and 2 are weakly equivalent, and assumes that optimizing away the other cases of the ZEND_SWITCH_LONG is safe (command used: php -d opcache.file_cache= -d opcache.opt_debug_level=0x20000 match_test.php)

0000 V0 = NEW 0 string("RuntimeException")
0001 DO_FCALL
0002 THROW V0
LIVE RANGES:
     0: 0001 - 0002 (new)

Contributor

TysonAndre commented on Apr 13

Another optimization I'd wanted to see in opcache for a while, but is out of the scope of this PR would be to automatically convert \in_array($value, [1, 3, 5, 7, 9], true) to match ($value) { 1, 3, 5, 7, 9 => true, default => false } (when all array values are known constants and the resulting opcodes would be a constant-time lookup)

  • Avoid a regular php function call
  • Convert iteration to an array lookup

(Opcache already does a similar optimization for non-strict in_array)

Contributor

Author

iluuu1994 commented on Apr 13

Thank you @TysonAndre for your very thorough review!

Contributor

Author

iluuu1994 left a comment

Opcache sees that '2' and 2 are weakly equivalent, and assumes that optimizing away the other cases of the ZEND_SWITCH_LONG is safe

is the cause of the incorrect optimization. Replacing it with this (the default case) solves the problem:

for (s = 0; s < block->successors_count; s++) {
    scdf_mark_edge_feasible(scdf, block_num, block->successors[s]);
}
return;

I'll have to investigate further tomorrow.

Contributor

TysonAndre commented on Apr 14

Also, there'd be a merge conflict if both this and the "throw expression" RFC pass.

Passing NULL to zend_compile_throw and checking for NULL in zend_compile_throw should fix that (there's no need for a temporary for the throw in the match expression, I'm guessing the temporary is bookkeeping for binary operations, etc.)

-void zend_compile_throw(zend_ast *ast) /* {{{ */
+void zend_compile_throw(znode *result, zend_ast *ast) /* {{{ */
 {
        zend_ast *expr_ast = ast->child[0];
 
@@ -4565,6 +4565,11 @@ void zend_compile_throw(zend_ast *ast) /* {{{ */
        zend_compile_expr(&expr_node, expr_ast);
 
        zend_emit_op(NULL, ZEND_THROW, &expr_node, NULL);
+
+       if (result != NULL) {
+               result->op_type = IS_CONST;
+               ZVAL_BOOL(&result->u.constant, 1);
+       }
 }
 /* }}} */
 
@@ -5378,7 +5383,7 @@ void zend_compile_match(znode *result, zend_ast *ast) /* {{{ */
                zend_ast *error_args_ast = zend_ast_create_list(0, ZEND_AST_ARG_LIST);
                zend_ast *new_error_ast = zend_ast_create(ZEND_AST_NEW, error_name_ast, error_args_ast);
                zend_ast *throw_ast = zend_ast_create(ZEND_AST_THROW, new_error_ast);
-               zend_compile_throw(throw_ast);
+               zend_compile_throw(NULL, throw_ast);

Contributor

Author

iluuu1994 commented on Apr 18

The optimizer is fixed now. Initially I wanted to use extended_value of the opcode but unfortunately that wasn't possible as it's already used for the default jump opnum. Thus I had to create two new opcodes (ZEND_MATCH_LONG and ZEND_MATCH_STRING).

@@ -925,6 +928,7 @@ static int zend_dfa_optimize_jmps(zend_op_array *op_array, zend_ssa *ssa)

}

break;

case ZEND_SWITCH_STRING:

case ZEND_MATCH_STRING:

if (opline->op1_type == IS_CONST) {

zval *zv = CT_CONSTANT_EX(op_array, opline->op1.constant);

if (Z_TYPE_P(zv) != IS_STRING) {

TysonAndre on Apr 18

edited

Contributor

I don't see any obvious bugs.

For this branch in particular, I think that if something is a ZEND_SWITCH_STRING, it would have to go to the first case (case by case IS_EQUAL comparisons), but if something was a ZEND_MATCH_STRING, it could safely go to the last case (the default case) instead?

(Both are correct, but the alternative is more efficient and I think it should eliminate more dead code)

removed_ops++;
MAKE_NOP(opline);
opline->extended_value = 0;
take_successor_ex(ssa, block_num, block, block->successors[0]);  // take the last successor instead?

iluuu1994 on Apr 20

Author

Contributor

I suppose this is irrelevant now that the IS_IDENTICAL/JMPNZ chain was removed, no?

iluuu1994 on Apr 20

Author

Contributor

Unfortunately the following doesn't get optimized:

function test1(string $value) {
    echo match($value) {
        1, 2, 3, 4, 5 => 'a',
        default => 'b',
    };
}

test1(1);
0000 CV0($value) = RECV 1
0001 MATCH_LONG CV0($value) 1: 0002, 2: 0002, 3: 0002, 4: 0002, 5: 0002, default: 0004
0002 T1 = QM_ASSIGN string("a")
0003 JMP 0005
0004 T1 = QM_ASSIGN string("b")
0005 ECHO T1
0006 RETURN null

But that has more to do with the fact that $value is a CV and not a CONST.

@@ -896,6 +898,7 @@ static int zend_dfa_optimize_jmps(zend_op_array *op_array, zend_ssa *ssa)

break;

}

case ZEND_SWITCH_LONG:

case ZEND_MATCH_LONG:

if (opline->op1_type == IS_CONST) {

zval *zv = CT_CONSTANT_EX(op_array, opline->op1.constant);

if (Z_TYPE_P(zv) != IS_LONG) {

TysonAndre on Apr 18

Contributor

Same here - for ZEND_MATCH_LONG, I'd assume taking the last successor instead of the first successor would be more efficient.

I'm not familiar with what take_successor_ex is actually doing though. I'd hope that'd work, but it may end up being more complicated than that.

							take_successor_ex(ssa, block_num, block, block->successors[0]);  // take last successor instead

Contributor

Author

iluuu1994 commented on Apr 18

Thanks @TysonAndre, your comments are tremendously helpful! Keep them coming slightly_smiling_face

Contributor

TysonAndre commented on Apr 18

I'm not sure why 017.phpt and 019.phpt started failing in travis, possibly adding to a hash table or looking up in a hash table the wrong way on some platforms, or ma

========DIFF========
002+ 
003+ Fatal error: Uncaught Exception in /home/travis/build/php/php-src/Zend/tests/match/017.php:4
004+ Stack trace:
005+ #0 /home/travis/build/php/php-src/Zend/tests/match/017.php(9): wrong()
006+ #1 /home/travis/build/php/php-src/Zend/tests/match/017.php(25): test_int('0')
007+ #2 {main}
008+   thrown in /home/travis/build/php/php-src/Zend/tests/match/017.php on line 4
========DIFF========
046+ string(1) "1"
046- string(7) "default"
========DONE========
FAIL Match expression long jump table [Zend/tests/match/019.phpt] 

Contributor

TysonAndre commented on Apr 18

And the JIT tests are failing, probably because the MATCH_STRING and MATCH_LONG opcodes would need to be updated for MATCH_STRING and MATCH_LONG) (e.g. it would work when there was an IS_IDENTICAL check to fall through to, but not when they get eliminated as dead code) https://dev.azure.com/phpazuredevops/PHP/_build/results?buildId=6794&view=ms.vss-test-web.build-test-results-tab&runId=155108&resultId=103028&paneView=debug

Unfortunately, there are few people familiar with the JIT, and I'm not one of them.


I'm terrible at assembly, but it looks like in ext/opcache/jit/zend_jit_x86.dasc,

returning 1 in zend_jit_switch and adding a todo might be an option for opline->opcode == ZEND_MATCH_LONG || opline->opcode == ZEND_MATCH_STRING.

I can't read what it's doing, but from the output of tests, it looks like it'd be emitting assembly to fall through to the next opcode in some cases, which works for ZEND_SWITCH_* but not ZEND_MATCH_*

If zend_jit_switch returned 1, it looks like opcache would just be prevented from JITing functions containing match if that was done, and people more familiar with internals could write the JIT code based on the zend_vm_def.h implementation.

e.g. 013.phpt

Stack trace
041+ string(1) "a"
042+ string(1) "a"
043+ string(1) "a"
044+ string(1) "a"
045+ string(1) "a"
046+ string(1) "a"
047+ string(1) "a"
048+ string(1) "a"
041- string(4) "b, c"
042- string(4) "b, c"
043- string(1) "d"
044- string(4) "e, f"
045- string(4) "e, f"
046- string(1) "g"
047- string(4) "h, i"
048- string(4) "h, i"

Member

nikic left a comment

LG. Please squash when merging, and add a note to the UPGRADING file.

Contributor

Author

iluuu1994 commented on Jul 10

Finally tada Thanks again @TysonAndre and @nikic for all the reviews!

Member

nikic commented on Jul 10

Unfortunately we have some exception handling issues, found by msan:

<?php
var_dump(match(true) {
    1, 2, 3, 4, 5 => 'foo',
});
0000 INIT_FCALL 1 96 string("var_dump")
0001 MATCH bool(true) 1: 0002, 2: 0002, 3: 0002, 4: 0002, 5: 0002, default: 0004
0002 T1 = QM_ASSIGN string("foo")
0003 JMP 0005
0004 MATCH_ERROR bool(true)
0005 SEND_VAL T1 1
0006 DO_ICALL
0007 RETURN int(1)
LIVE RANGES:
     1: 0003 - 0005 (tmp/var)

Note that T1 range goes from 3-5, including MATCH_ERROR, which means that we'll try to free T1 when match throws, even though it has not been initialized.

Member

nikic commented on Jul 10

edited

My initial thought here was to fix this by adding a dummy T1 = MATCH_ERROR return value, that would make sure that the live range doesn't cross the MATCH_ERROR opcode. However, if I do that I get a misoptimization in block_pass (zend_t_usage).

Contributor

TysonAndre commented on Jul 11

My initial thought here was to fix this by adding a dummy T1 = MATCH_ERROR return value, that would make sure that the live range doesn't cross the MATCH_ERROR opcode. However, if I do that I get a misoptimization in block_pass (zend_t_usage).

Note that T1 range goes from 3-5, including MATCH_ERROR, which means that we'll try to free T1 when match throws, even though it has not been initialized.

@nikic - What about moving the default to be before the rest of the arms if it's a MATCH_ERROR? We already allow default => ... to explicitly be the first arm. This would have the benefit of avoiding an extra jump in one of the arms if UnhandledMatchError is assumed to be uncommon.

I.e. instead of this:

0001 MATCH bool(true) 1: 0002, 2: 0002, 3: 0002, 4: 0002, 5: 0002, default: 0004
0002 T1 = QM_ASSIGN string("foo")
0003 JMP 0005
0004 MATCH_ERROR bool(true)
0005 SEND_VAL T1 1

LIVE RANGES:
     1: 0003 - 0005 (tmp/var)

Use the following instead?

0001 MATCH bool(true) 1: 0003, 2: 0002, 3: 0002, 4: 0002, 5: 0002, default: 0002
0002 MATCH_ERROR bool(true)
0003 T1 = QM_ASSIGN string("foo")
0004 JMP 0005 (this opcode can be eliminated for a small performance benefit if default becomes the first arm)
0005 SEND_VAL T1 1

LIVE RANGES:
     1: 0004 - 0005 (tmp/var) (instead of including MATCH_ERROR)

I'm assuming that the solution you proposed above (T1 = MATCH_ERROR) works, it's just unoptimized, and that tooling for php 8.0 can assume that match expressions will be in php 8.0-dev.

Contributor

Author

iluuu1994 commented on Jul 11

@nikic I'm assuming you had the same misoptimization?

Specifically, this is wrong:

- 0045 ECHO string("1")
+ 0045 ECHO T1
...
- 0055 ECHO string("1, 1")
+ 0055 ECHO string("2, 1")

The exact place it is happening is here (for ZEND_QM_ASSIGN):

if (opline->op1_type == IS_CONST) {
literal_dtor(&ZEND_OP1_LITERAL(opline));
}
MAKE_NOP(opline);

Not sure if this is a match specific issue or a general optimization bug.

@TysonAndre Thanks for your suggestion, I'll try it!

Contributor

Author

iluuu1994 commented on Jul 12

Implemented here. Valgrind no longer complains.

iluuu1994

added a commit to iluuu1994/php-src that referenced this pull request

on Jul 12

iluuu1994

added a commit to iluuu1994/php-src that referenced this pull request

on Jul 12

iluuu1994

added a commit to iluuu1994/php-src that referenced this pull request

on Jul 12

Contributor

jrfnl commented on Jul 12

edited

Hi all, I've just been looking at this RFC and the implementation of it with an eye of adding support for match expressions to PHP_CodeSniffer and I was wondering why no new T_MATCH token has been introduced.

Is there anyone here who could enlighten me ?

Scratch that. I suspect that this feature isn't yet available in alpha2 which I was using for testing. Sorry for the noise.

Contributor

Author

iluuu1994 commented on Jul 12

edited

@jrfnl It's not available in alpha 2. It was merged a few days later.

Contributor

jrfnl commented on Jul 12

@iluuu1994 Thanks for confirming. I look forward to a next alpha (for Windows), so I can actually test it.

Contributor

GrahamCampbell commented on Jul 12

edited

You can use my PHP 8.0 docker image (on linux/mac or WSL2): registry.gitlab.com/grahamcampbell/php:8.0. It's based on the official docker PHP image (official as in by the Docker team, rather than the PHP core team), and includes some commonly used extensions, on by default, including Xdebug 3.0-dev.

zmitic commented on Sep 20

edited

First: I know this is not support forum so I am sorry about my question.

Second; I read on RFC that match condition can be any arbitrary expression. Does it mean we can do instance of?

Real-life example; I have base class that is being extended with 2 different classes (will be more) and each child must have matching DTO:

public static function createFromEntity(BaseConfig $config): self
{
    if ($config instanceof ServiceCategoryConfig) {
        return new ServiceCategoryConfigDTO($config->getCategory());
    }
    if ($config instanceof DiscountConfig) {
        return new DiscountConfigDTO($config->getDiscount());
    }

    // do something else here
}

Can match be used in cleaning this? I know I can put a method on entity class that will create DTO but I would rather avoid it.


If it can't, what about not requiring param and be something like:

return match() {
    $config instance of ServiceCategoryConfig => new ServiceCategoryConfigDTO($config->getCategory()), 
    $config instance of DiscountConfig => new DiscountConfigDTO($config->getDiscount()),
    default => // do something else here 
}

Contributor

Author

iluuu1994 commented on Sep 20

@zmitic Yes, but as with switch you have to pass true as the expression.

https://3v4l.org/stkDM

zmitic commented on Sep 20

@iluuu1994 Thank you, this is truly amazing feature!

spiritinlife commented on Oct 2

edited

@iluuu1994

With the addition of union types wouldn't it make sense for match to handle this.
( I may be missing something obvious here )

<?php

class ValidationError {

   public function __construct(
       public string $message
   ) {}
    
}

function validate(string $errorsOnPoupanka): bool|ValidationError {
    
    if ( $errorsOnPoupanka === 'poupanka')
        return new ValidationError("Poupanka error");
    
    return true;
}

var_dump(match(validate("poupanka")) {
        ValidationError::class => $validationResult->message,
        true => "Nice",
});

Or even better something like this

var_dump(match(validate("poupanka")) {
        (ValidationError $error) => $error,
        (bool $v) => "Nice",
});

https://3v4l.org/sB1kM

Contributor

Author

iluuu1994 commented on Oct 2

@spiritinlife Matching by type would be part of pattern matching which is being discussed for 8.1. The syntax is not clear yet.

@iluuu1994 I see thank you, is there a discussion i can follow for this ?

Contributor

Author

iluuu1994 commented on Oct 2

@spiritinlife No not at the moment. We're working on an RFC for enums/ADTs which would also make use of pattern matching. But nothing specific for type pattern matching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Assignees

No one assigned

Labels
Projects

None yet

Milestone

PHP 8.0

Linked issues

Successfully merging this pull request may close these issues.

None yet

13 participants

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK