Operator precedence by textual substitution

 3 months ago
source link: https://www.kmjn.org/notes/operator_precedence_fortran.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Operator precedence by textual substitution

A technique from early Fortran

Operator precedence is a traditional feature of mathematical notation where, absent explicit parentheses, some operators bind tighter than others ("have higher precedence"). For example, a + b × c would be interpreted as a + (b × c), not as (a + b) × c, because multiplication has higher precedence than addition.

This is a bit annoying to implement in programming languages. As a result, some early languages just didn't do it. Lisp (developed in the late 1950s) sidesteps the issue by not having infix operator notation in the first place, instead writing the above expression in fully parenthesized prefix notation as (+ a (* b c)). An early compiler from IBM, the Internal Translator (IT) (1956) [pdf] had more traditional infix notation, but no operator precedence: all operators had the same precedence, and were right-associative (see pp. 1.19–1.20). Users were expected to add parentheses where needed.

Alas for this simple solution, in a 1962 paper, "A History of Writing Compilers" (Computers and Automation, Dec. 1962 [pdf], pp. 8–18), Donald Knuth claims that "The lack of operator priority (often called precedence or hierarchy) in the IT language was the most frequent single cause of errors by the users of that compiler".

Fortran (also from IBM, released 1957) gave in and did implement operator precedence. But it did so using a method that seems to me equal parts huge hack and clever solution, a raw textual search-and-replace scheme. As Knuth summarizes:

An ingenious idea used in the first FORTRAN compiler was to surround binary operators with peculiar-looking parentheses:

+ and - were replaced by )))+((( and )))-(((
* and / were replaced by ))*(( and ))/((
** was replaced by )**(

and then an extra ((( at the left and ))) at the right were tacked on. The resulting formula is properly parenthesized, believe it or not. For example, if we consider "(X + Y) + W/Z," we obtain

((((X))) + (((Y)))) + (((W))/((Z))).

This is admittedly highly redundant, but extra parentheses need not affect the resulting machine language code. After the above replacements are made, a parenthesis-level method can be applied to the resulting formulas.

I had to work out a few examples to convince myself that this scheme really works (even Knuth seems a bit surprised). It always produces syntactically correct expressions with balanced parentheses, and the resulting expressions follow the operators' precedence. Try it out!

Input expression:


The original scheme has three precedence levels. Lowest precedence are addition (+) and subtraction (-); intermediate precedence are multiplication (*) and division (/); and highest precedence is exponentiation (**). But the same pattern can be extended to an arbitrary number of levels by just adding more parentheses. The more outward-facing parentheses you add around an operator, the lower its precedence.

Nowadays, there are standard ways to handle operator precedence, such as by stratifying a grammar or making use of the precedence declarations that some parser generators offer. But if you find yourself wanting to implement an expression parser and don't want to bother handling precedence, you can do some quick-and-dirty string substitution instead! Since this is purely a textual preprocessing step, the rest of the language implementation doesn't need to know about operator precedence.

About Joyk

Aggregate valuable and interesting links.
Joyk means Joy of geeK