4

Late-bound argument defaults for Python

 3 years ago
source link: https://lwn.net/Articles/875441/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Late-bound argument defaults for Python

Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

Python supports default values for arguments to functions, but those defaults are evaluated at function-definition time. A proposal to add defaults that are evaluated when the function is called has been discussed at some length on the python-ideas mailing list. The idea came about, in part, due to yet another resurrection of the proposal for None-aware operators in Python. Late-bound defaults would help with one use case for those operators, but there are other, stronger reasons to consider their addition to the language.

In Python, the defaults for arguments to a function can be specified in the function definition, but, importantly, they are evaluated in the scope where the function is defined. So default arguments cannot refer to other arguments to the function, as those are only available in the scope of the function itself when it gets called. For example:

    def foo(a, b = None, c = len(a)):
        ...
That definition specifies that a has no default, b defaults to None if no argument gets passed for it, and c defaults to the value of len(a). But that expression will not refer to the a in the argument list; it will instead look for an a in the scope where the function is defined. That is probably not what the programmer intended. If no a is found, the function definition will fail with a NameError.

On October 24, Chris Angelico introduced his proposal for late-bound arguments. He used an example function, derived from the bisect.bisect_right() function in the standard library, to demonstrate the idea. The function's arguments are specified as follows:

def bisect(a, x, lo=0, hi=None):

He notes that there is a disparity between lo and hi: "It's clear what value lo gets if you omit it. It's less clear what hi gets." Early in his example function, hi is actually set to len(a) if it is None. Effectively None is being used as a placeholder (or sentinel value) because Python has no way to directly express the idea that hi should default to the length of a. He proposed new syntax to identify hi as a late-bound argument:

def bisect(a, x, lo=0, hi=:len(a)):

The "=:" would indicate that if no argument is passed for hi, the expression would be evaluated in the context of the call and assigned to hi before any of the function's code is run. It is interesting to note that the documentation for bisect.bisect_right() linked above looks fairly similar to Angelico's idea (just lacking the colon) even though the actual code in the library uses a default value of None. It is obviously useful to know what the default will be without having to dig into the code.

In his post, Angelico said that in cases where None is a legitimate value, there is another way to handle the default, but it also obscures what the default will be:

And the situation only gets uglier if None is a valid argument, and a unique sentinel is needed; this standard idiom makes help() rather unhelpful:

_missing = object()
def spaminate(thing, count=_missing):
    if count is _missing: count = thing.getdefault()

Proposal: Proper syntax and support for late-bound argument defaults.

def spaminate(thing, count=:thing.getdefault()):
    ...

[...]

The purpose of this change is to have the function header define, as fully as possible, the function's arguments. Burying part of that definition inside the function is arbitrary and unnecessary.

The first order of business in these kinds of discussions is the inevitable bikeshedding about how the operator is spelled. Angelico chose a "deliberately subtle" syntax, noting that in many cases it will not matter when the argument is bound. It is visually similar to the walrus operator (":="), but that is not legal in a function definition, so there should be no ambiguity, he said.

Ethan Furman liked the idea but would rather see a different operator (perhaps "?=") used because of the potential confusion with the walrus operator. Guido van Rossum was also in favor of the feature, but had his spelling suggestion as well:

I like that you're trying to fix this wart! I think that using a different syntax may be the only way out. My own bikeshed color to try would be `=>`, assuming we'll introduce `(x) => x+1` as the new lambda syntax, but I can see problems with both as well :-).

New syntax for lambda expressions has also been discussed, with most settling on "=>" as the best choice, in part because "->" is used for type annotations; some kind of "arrow" operator is commonly used in other languages for defining anonymous functions. Several others were similarly in favor of late-bound defaults and many seemed to be happy with Van Rossum's spelling, but Brendan Barnwell was opposed to both; he was concerned that it would "encourage people to cram complex expressions into the function definition". Since it would only truly be useful—readable—for a simpler subset of defaults, it should not be added, he said. Furthermore:

To me, this is definitely not worth adding special syntax for. I seem to be the only person around here who detests "ASCII art" "arrow" operators but, well, I do, and I'd hate to see them used for this. The colon or alternatives like ? or @ are less offensive but still too inscrutable to be used for something that can already be handled in a more explicit way.

But Steven D'Aprano did not think that the addition of late-bound defaults would "cause a large increase in the amount of overly complex default values". Angelico was also skeptical that the feature was some sort of bad-code attractant. "It's like writing a list comprehension; technically you can put any expression into the body of it, but it's normally going to be short enough to not get unwieldy." In truth, any feature can be abused; this one does not look to them to be particularly worse in that regard.

PEP 671

Later that same day, Angelico posted a draft of PEP 671 ("Syntax for late-bound function argument defaults"). In it, he adopted the "=>" syntax, though he noted a half-dozen other possibilities. He also fleshed out the specification of the default expression and some corner cases:

The expression is saved in its source code form for the purpose of inspection, and bytecode to evaluate it is prepended to the function's body.

Notably, the expression is evaluated in the function's run-time scope, NOT the scope in which the function was defined (as are early-bound defaults). This allows the expression to refer to other arguments.

Self-referential expressions will result in UnboundLocalError::

    def spam(eggs=>eggs): # Nope

Multiple late-bound arguments are evaluated from left to right, and can refer to previously-calculated values. Order is defined by the function, regardless of the order in which keyword arguments may be passed.

But one case, which had been raised by Ricky Teachey in the initial thread, was discussed at some length when Jonathan Fine asked about the following function definition:

def puzzle(*, a=>b+1, b=>a+1):
    return a, b

Angelico was inclined to treat that as a syntax error, "since permitting it would open up some hard-to-track-down bugs". Instead it could be some kind of run-time error in the case where neither argument is passed, he said. He is concerned that allowing "forward references" to arguments that have yet to be specified (e.g. b in a=>b+1 above) will be confusing and hard to explain. D'Aprano suggested handling early-bound argument defaults before their late-bound counterparts and laid out a new process for argument handling that was "consistent and understandable". In particular, he saw no reason to make some kinds of late-bound defaults into a special case:

Note that step 4 (evaluating the late-bound defaults) can raise *any* exception at all (it's an arbitrary expression, so it can fail in arbitrary ways). I see no good reason for trying to single out UnboundLocalError for extra protection by turning it into a syntax error.

Angelico noted that it was still somewhat difficult for even experienced Python programmers to keep straight, but, in addition, he had yet to hear of a real use case. Erik Demaine offered two examples, "though they are a bit artificial"; he said that simply evaluating the defaults in left-to-right order (based on the function definition) was reasonably easy to understand. Angelico said that any kind of reordering of the evaluation was not being considered; as he sees it:

The two options on the table are:

1) Allow references to any value that has been provided in any way
2) Allow references only to parameters to the left

Option 2 is a simple SyntaxError on compilation (you won't even get as far as the def statement). Option 1 allows everything all up to the point where you call it, but then might raise UnboundLocalError if you refer to something that wasn't passed.

The permissive option allows mutual references as long as one of the arguments is provided, but will give a peculiar error if you pass neither. I think this is bad API design.

Van Rossum pointed out that the syntax-error option would break new ground: "Everywhere else in Python, undefined names are runtime errors (NameError or UnboundLocalError)." Angelico sees the error in different terms, though, noting that mismatches in global and local scope are a syntax error; he gave an example:

>>> def spam():
...     ham
...     global ham
...
  File "<stdin>", line 3
SyntaxError: name 'ham' is used prior to global declaration

He also gave a handful of different function definitions that were subtly different using the new feature; he was concerned about the "bizarre inconsistencies" that can arise, because they "are difficult to explain unless you know exactly how everything is implemented internally". He would prefer to see real-world use cases for the feature to decide whether it should be supported at all, but was adamant that the strict left-to-right interpretation was easier to understand:

If this should be permitted, there are two plausible semantic meanings for these kinds of constructs:

1) Arguments are defined left-to-right, each one independently of each other
2) Early-bound arguments and those given values are defined first, then late-bound arguments

The first option is much easier to explain [...]

D'Aprano explained that the examples cited were not particularly hard to understand and fell far short of the "bizarre inconsistencies" bar. There is a clear need to treat the early-bound and late-bound defaults differently:

However there is a real, and necessary, difference in behaviour which I think you missed:

    def func(x=x, y=>x)  # or func(x=x, @y=x)

The x=x parameter uses global x as the default. The y=x parameter uses the local x as the default. We can live with that difference. We *need* that difference in behaviour, otherwise these examples won't work:

    def method(self, x=>self.attr)  # @x=self.attr

    def bisect(a, x, lo=0, hi=>len(a))  # @hi=len(a)

Without that difference in behaviour, probably fifty or eighty percent of the use-cases are lost. (And the ones that remain are mostly trivial ones of the form arg=[].) So we need this genuine inconsistency.

As can be seen, D'Aprano prefers a different color for the bikeshed: using "@" to prepend late-bound default arguments. He also said that Angelico had perfectly explained the "harder to explain" option in a single sentence; both are equally easy to explain, D'Aprano said. Beyond that, it does not make sense to "prohibit something as a syntax error because it *might* fail at runtime". In a followup message, he spelled that out further:

We don't do this:

    y = x+1  # Syntax error, because x might be undefined

and we shouldn't make this a syntax error

    def func(@spam=eggs+1, @eggs=spam-1):

either just because `func()` with no arguments raises. So long as you pass at least one argument, it works fine, and that may be perfectly suitable for some uses.

Winding down

While many of the participants in the threads seem reasonably happy—or at least neutral—on the idea, there is some difference of opinion on the details as noted above. But several thread participants are looking for a more general "deferred evaluation" feature, and are concerned that late-bound argument defaults will preclude the possibility of adding such a feature down the road. Beyond that, Eric V. Smith wondered about how late-bound defaults would mesh with Python's function-introspection features. Those parts of the discussion got a little further afield from Angelico's proposal, so they merit further coverage down the road.

At first blush, Angelico's idea to fix this "wart" in Python seems fairly straightforward, but the discussion has shown that there are multiple facets to consider. It is not quite as simple as "let's add a way to evaluate default arguments when the function is called"—likely how it was seen at the outset. That is often the case when looking at new features for an established language like Python; there is a huge body of code that needs to stay working, but there are also, sometimes conflicting, aspirations for features that could be added. It is a tricky balancing act.

As with many python-ideas conversations, there were multiple interesting sub-threads, touching on language design, how to teach Python (and this feature), how other languages handle similar features (including some discussion of ALGOL thunks), the overall complexity of Python as it accretes more and more features, and, of course, additional bikeshedding over the spelling. Meanwhile, Angelico has been working on a proof-of-concept implementation, so PEP 671 (et al.) seems likely to be under discussion for some time to come.


(Log in to post comments)


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK