Operator precedence by textual substitution

A technique from early Fortran

Operator precedence is a traditional feature of mathematical notation where, absent explicit parentheses, some operators bind tighter than others ("have higher precedence"). For example, a + b × c would be interpreted as a + (b × c), not as (a + b) × c, because multiplication has higher precedence than addition.

This is a bit annoying to implement in programming languages. As a result, some early languages just didn't do it. Lisp (developed in the late 1950s) sidesteps the issue by not having infix operator notation in the first place, instead writing the above expression in fully parenthesized prefix notation as (+ a (* b c)). An early compiler from IBM, the Internal Translator (IT) (1956) [pdf] had more traditional infix notation, but no operator precedence: all operators had the same precedence, and were right-associative (see pp. 1.19–1.20). Users were expected to add parentheses where needed.

Alas for this simple solution, in a 1962 paper, "A History of Writing Compilers" (Computers and Automation, Dec. 1962 [pdf], pp. 8–18), Donald Knuth claims that "The lack of operator priority (often called precedence or hierarchy) in the IT language was the most frequent single cause of errors by the users of that compiler".

Fortran (also from IBM, released 1957) gave in and did implement operator precedence. But it did so using a method that seems to me equal parts huge hack and clever solution, a raw textual search-and-replace scheme. As Knuth summarizes:

An ingenious idea used in the first FORTRAN compiler was to surround binary operators with peculiar-looking parentheses:
+ and - were replaced by )))+((( and )))-(((
* and / were replaced by ))*(( and ))/((
** was replaced by )**(
and then an extra ((( at the left and ))) at the right were tacked on. The resulting formula is properly parenthesized, believe it or not. For example, if we consider "(X + Y) + W/Z," we obtain
((((X))) + (((Y)))) + (((W))/((Z))).
This is admittedly highly redundant, but extra parentheses need not affect the resulting machine language code. After the above replacements are made, a parenthesis-level method can be applied to the resulting formulas.

I had to work out a few examples to convince myself that this scheme really works (even Knuth seems a bit surprised). It always produces syntactically correct expressions with balanced parentheses, and the resulting expressions follow the operators' precedence. Try it out!

Input expression:

Result:

The original scheme has three precedence levels. Lowest precedence are addition (+) and subtraction (-); intermediate precedence are multiplication (*) and division (/); and highest precedence is exponentiation (**). But the same pattern can be extended to an arbitrary number of levels by just adding more parentheses. The more outward-facing parentheses you add around an operator, the lower its precedence.

Nowadays, there are standard ways to handle operator precedence, such as by stratifying a grammar or making use of the precedence declarations that some parser generators offer. But if you find yourself wanting to implement an expression parser and don't want to bother handling precedence, you can do some quick-and-dirty string substitution instead! Since this is purely a textual preprocessing step, the rest of the language implementation doesn't need to know about operator precedence.

Operator precedence by textual substitution

Operator precedence by textual substitution

Recommend

Intel's New Chimera: Alder Lake

苹果不再是全球市值最高的公司被沙特阿美取代|苹果|市值|沙特阿美_业界_新浪科技_新...

Streaming Server Side Render in React 18

Covid: Have the lockdown success stories lost their shine?

How to Get Started With Blockchain Adoption

Getting latest tag on git repository

The 10 Best AI Writers & Content Generators Compared

Pretty Vagrant – the easiest way to get Oracle XE21c running on Ubuntu

Useful Git Alias

Proven Ways To Improve Organic Marketing Workflow Efficiency

About Joyk