How can we compare expressive power between two Turing-complete languages?

Is this possible?

Is there an accepted (and unambiguous) notion of "expressive power" that could differ between two different Turing-complete languages?

It seems like, for example, Python is intuitively much more "expressive" (in some sense) than an assembly language since Python's high-level constructs (like classes, functions and even for loops) often make it much easier to organize and think about code compared with assembly languages.

On the other hand, any program you can write in Python you can also write in an assembly language since both languages are Turing-complete (ignoring library availability, etc).

But do you really want to implement complex algorithms in FRACTRAN or as a Magic: the Gathering deck? Why not?

Is there a way to give a real, unambiguous definition to this intuitive notion of "expressiveness?"

asked yesterday

1 Answer

Sorted by: Reset to default

Surprisingly yes, this is possible!

This is actually pretty important for people that work on optimizing compilers. If I add this new feature, will it break any existing optimization pass? I will show why this is a concern later in the answer.

And this is good since all of us are interested in programming languages. Surely we should have at least one clear-cut way to distinguish between them, beyond the names of keywords, etc! Most people would rather write a large system in [insert your favorite language here] than, say, Befunge or INTERCAL. But those languages are Turing-complete.

So, what's going on here?

Not only that, the concept we need here has significance in formal semantics beyond this question. I'll briefly mention this at the end of the answer.

Can you ever distinguish between 2 * 3 and 3 + 3 in a programming language? Are you sure? Are there any "reasonable" languages in which you can distinguish between these? What, exactly, would it mean to be able to distinguish between them and how can we give a definition for "this language distinguishes between these things?" Read on to find out more!

The concept we need is called "observational equivalence."

A roadmap:

First I'll give an intuitive description of how you could understand the statement "feature X adds expressive power to language L".
Next, I'll need to talk about programs that have a hole in them. These are called "one-hole contexts". Sometimes this is abbreviated to just "context."
Then I'll use one-hole contexts to define "observational equivalence." This is one way to talk about certain programs being "equivalent."
Finally, we will see that we can say adding feature X to language L makes the language more expressive if, by adding feature X, we end up with fewer observational equivalences in the resulting language L+X. That might sound backward. I'll get to this, though.

The key idea is this: Consider taking a Turing-complete language L and adding feature X. Now think about how we could translate code written in L+X back into into L. If feature X makes the language "more expressive," then this transformation will require you to apply a large-scale, "global" transformation. On the other hand, if language L and L+X are "equally expressive," then L+X can be transformed into L using only a "local" transformation.

Inspiration

I will state upfront that this answer (and the way I framed the original question) is inspired by Shriram Krishnamurthi's excellent and approachable 2019 talk about Matthias Felleisen's paper "On the Expressive Power of Programming Languages." Anyone who is interested in this question should absolutely watch it. The corresponding paper is here.

I actually think that talk would be interesting to everyone here, in fact. There are many questions that have variations on "... what if I add feature X to a language ..." and this talk gives a way to understand one aspect of that kind of question!

Incidentally, I will be giving spoilers for the talk in this answer!

Intuition

Let's say we start with a language that only has functions, conditionals, integer literals, addition, subtraction, and multiplication. This is Turing-complete if we allow recursion. Let's also say that functions are first-class. This won't come up right now, though.

If we add, say, a unary negation operator to our language, this does not intuitively "increase expressive power." However, if we add exceptions to the language, well, that does seem to intuitively "increase expressive power."

Why? Well, for the first feature, whenever we see a unary negation -x we can just translate it, right then and there, into 0 - x. However, for the second feature, when we see an exception being thrown or a try-catch block, then we actually have to do a large scale transformation of the program.

I chose exceptions since they are of greater interest, but let's consider a simpler example of a feature that "adds expressiveness." How about a keyword halt that immediately terminates the program? You certainly cannot implement this as a local transformation to our language! You actually have to apply a transformation that starts at the beginning of the program!

With the negation feature, we only had to transform the exact part of the code where the unary negation is used. For the halt feature, we had to transform the code far beyond where that keyword is used.

This distinction between "feature only requiring local transformation" and "feature requiring global transformation" is what we want to formalize.

We can get closer to a formal statement by saying: a feature does not "add expressive power" if we can implement it as a macro that turns it into something in the original language. Even if the language doesn't have macros, you could imagine using some kind of preprocessor to implement a local transformation as a kind of macro.

This is not quite a formal description yet, but we're getting there! For one thing, how do we prove that a macro cannot exist for some feature? To do this, we still need a formal description of the problem.

What's tricky about equality of programs?

First, we need a way to compare programs for equality.

You might wonder: why not just use the equality built in to the language? Well, there are a few reasons:

The equality operation might not cover everything. For example, it's actually impossible to compare arbitrary functions for equality in a Turing-complete programming language (in a way that is always correct and always halts).
The language might not even have an equality operation, even though it's Turing-complete. This is the case for many esoteric languages, like FRACTRAN.

So we really want something that's somehow "more fundamental" than that and more broadly applicable. We actually do want something that's inside the language that can distinguish between two things, though. But how do we do this?

What it really boils down to is this: given two things in our language x and y, is there any way to write a program in our language that behaves differently if you used x vs if you used y?

A specific question: Is it possible to distinguish between 2 * 3 and 3 + 3? I will return to this particular question in the Expressiveness section.

Formalizing all this is where we get observational equivalence.

But first, we need to talk about what happens when you take a program and cut a hole out of it.

Programs with a hole

Imagine you take a valid program in your language and print it out. Then you physically cut out exactly one subterm and throw it away. What you have left is a "program with a hole in it," or a printout of a "one-hole context" (sometimes just called "a context").

If you take a slip of paper, write a term on it, and tape it into the hole you now have a program again (as long as the result of this process is well-typed). This is the primary thing you can do with a one-hole context: You can obtain a program by combining it with a term that can fill in the hole.

If we call our context C and the new term we're taping into it t then the complete program we obtain when we tape t into C is called C[t].

Note that this definition actually always makes sense no matter what programming language we are looking at! I've made no assumptions about the language.

Our definition of observational equivalence will center around these contexts (that is, programs with exactly one hole in them).

Observational equivalence

One way to define what it means for two programs to be "equivalent" is that every observation you can make about them is the same. There is actually a decent amount of nuance to "what counts as an observation?" but that's outside the scope of this answer.

We will define observational equivalence like this (first attempt):

Two expressions e1 and e2 are called "observationally equivalent" if, for every context C, we have C[e1] = C[e2]

This is the same as saying, given those two expressions, does the language provide you a way to tell them apart! It's asking "Are the two expressions identical when plugged into any context?"

"Uh, wait a minute..." I hear you say. "But there's still an equals sign! What does that mean? The whole point was to avoid using an existing notion of equality!"

That is very true! We need to improve our definition. We need to be a bit more clever (this new definition is due to James Morris's 1969 PhD thesis):

Two expressions e1 and e2 are called "observationally equivalent" if, for every context C, we have: C[e1] halts if and only if C[e2] halts

This might seem a bit weird. Think about this for a minute. Does this seem, somehow, wrong?

Take a moment to consider this question: "Are we saying that 1 and 2 are 'observationally equivalent'? Both 1 and 2 very straightforwardly terminate!" (I've taken this from this part of Shriram's talk. Spoilers for the talk ahead!)

I am not saying they are observationally equivalent! An example: Can you write a Python program with one hole in it that will halt if the hole is filled with 1 but will loop forever if it is filled with 2 (you could just write <hole> for the hole and fill it in as appropriate)? If so, you've proven that 1 and 2 are observationally distinct in Python! To show that they are distinct, all you have to do is find one context C where C[1] halts but C[2] does not (or visa-versa). If they were the same then every context would behave the same (with regard to halting) regardless of whether you plugged in 1 or 2.

What if we only have a isZero(x) builtin function, but no builtin equality operation? Can we still tell 1 apart from 2? Well, how about the context while (isZero(1 - <hole>)) { }? This is why the definition works well: we need to consider all possible contexts!

Okay, but what if there is actually no way to tell 1 and 2 apart? Then we would actually say that are observationally equivalent! Observational equivalence captures the idea of what it means for two things to be indistinguishable inside a programming language.

Expressiveness

Now how does all of this relate to expressiveness? I'm going to start by way of example. I'll return to the question: Is it possible to distinguish between 2 * 3 and 3 + 3?

My answer: It depends on the features of the language! It certainly is not possible in the language which only has basic arithmetic, functions and recursion. Can you think of a feature we could add where it is possible to distinguish them?

Operator overloading! Say we have operator overloading and the ability to redefine existing function. If we overload * to do something weird, like return the first argument but we don't overload +. We can now distinguish between those two expressions!

By adding that feature, we broke an observational equivalence. The expressions 2 * 3 and 3 + 3 used to be observationally equivalent. Then we added operator overloading and now they are observationally distinct.

Now we need to address how to tell if a feature requires a "local" transformation vs "global" transformation.

The key theorem

If feature X can be implemented for language L as a local transformation to obtain the language L+X, then for any two expressions e1 and e2 that are observationally equivalent in L, it is also the case that they are observationally equivalent in L+X

The theorem is proven in the paper.

This is saying that if feature X only required a local transformation to implement, then it did not "break" any observational equivalences. All of the observational equivalences from L are still true in L+X.

Note that this was not the case for operator overloading. Adding operator overloading did break some observational equivalences. On the other hand, when we added unary negation we did not break any observational equivalences.

Now we can say that when we add expressiveness to a language, we break some observational equivalences.

One practical implication of this is that some optimization passes might be broken when you add a new feature. This is because optimizations assume that certain things are identical. But when you add a new "expressive" feature, you will break some of those equivalences!

Exercise for the reader: I mentioned adding halt to a simple language and I said that this is an expressive new feature for that language. Can you give an example of an observational equivalence that holds in the original language, but not the new language with halt?

Bonus: Connecting a denotational semantics to an operational semantics

Even though this is just a small part of the answer that is only tangentially relevant to my question, this is what motivated me to write the question and this answer. Particularly, to partially clear up some confusion about denotational semantics from the comments of my previous question and answer. This is only one component of that, but it is an important one.

Up until now, we see one useful application of observation equivalence. This allowed me to define it and motivate it. Now we can see, briefly, another application of observational equivalence that is relevant to defining a denotational semantics. It would really be a different answer to a different question to address this in more detail, however, so I will gloss over some things.

There are two very crucial properties you want from your denotation function when you create a denotational semantics. One is called "compositionality" and the other is called "adequacy." "Adequacy" is defined using observational equivalence. Observational equivalence is derived from an operational semantics.

"Adequacy": If two programs are observationally distinct, then they must have distinct denotations.

Consider what it would be like if our denotation function was not "adequate." We would have two programs that we can observe are different, but they would be mapped to the same mathematical object. That's not very useful way to denote programs!

How can we compare expressive power between two Turing-complete languages?

1 Answer

Inspiration

Intuition

What's tricky about equality of programs?

Programs with a hole

Observational equivalence

Expressiveness

The key theorem

Bonus: Connecting a denotational semantics to an operational semantics

Further reading

Papers

Books

Recommend

一周内OpenAI再遇官司！两位作家起诉ChatGPT侵犯版权

Apple seeds third iOS 17 and iPadOS 17 developer betas

前端开发工具 vscode 对比 webstrom 哪个用的人多?

让“表白”更有意境，小音箱还能这么玩！_原创_新浪众测

Pixel 8 series may only offer small battery and wired charging upgrades

The robots are coming. Company offers solution to labor shortage in housing indu...

全网注释第二全的GO教程-定义变量

Replace direct UrlbarInput.inputField access with UrlbarInput proxies

抖音启动暑期未成年人网络保护专项治理

Nylas | Powerful Communication APIs

About Joyk