Pretty-print syntax trees with this one simple trick

prettyprint

I want to share a simple trick for pretty-printing syntax trees with the correct precedence that I’ve been using in my own interpreter projects. I believe this trick has been shared before, but I don’t know what the name of it is, so I wasn’t able to easily search for prior art. If somebody knows where this idea originated from then I can update this post to credit the original.

To illustrate the trick, I’d like to begin from the following Haskell type for a lambda calculus expression:

data Expression
    = Lambda Text Expression
    | Application Expression Expression
    | Variable Text

… and a matching grammar for parsing such an expression (using the same notation that the happy package uses):

Expression
  : '\\' label '->' Expression                { Lambda $2 $4      }
  | ApplicationExpression                     { $1                }

ApplicationExpression
  : ApplicationExpression VariableExpression  { Application $1 $2 }
  | VariableExpression                        { $1                }

VariableExpression
  : label                                     { Variable $1       }
  | '(' Expression ')'                        { $2                }

The trick

We can pretty-print that Expression type with the correct precedence without having to keep track of any precedence level. Instead, all we have to do is to write the pretty-printer to match the shape of the grammar, like this:

prettyExpression :: Expression -> Text
prettyExpression (Lambda x e) =
    "\\" <> x <> " -> " <> prettyExpression e
prettyExpression other =
    prettyApplicationExpression other

prettyApplicationExpression :: Expression -> Text
prettyApplicationExpression (Application f x) =
    prettyApplicationExpression f <> " " <> prettyVariableExpression x
prettyApplicationExpression other =
    prettyVariableExpression other

prettyVariableExpression :: Expression -> Text
prettyVariableExpression (Variable x) =
    x
prettyVariableExpression other =
    "(" <> prettyExpression other <> ")"

The pretty-printing logic closely follows the grammar

Create one pretty… function for each nonterminal symbol in the grammar

For example, since we have a nonterminal symbol named ApplicationExpression, we create a matching pretty-printing function named prettyApplicationExpression.
Each pretty… function matches one pattern per alternative in the grammar

In other words, if the production rule for ApplicationExpression has two alternatives, then the prettyApplicationExpression matches two patterns corresponding to each of the two alternatives, respectively.
Pretty-print non-terminal symbols using the matching pretty-printing function

For example, we pretty-print a function’s argument using prettyVariableExpression since we used the VariableExpression nonterminal symbol to parse that argument.
Pretty-print terminal symbols in the obvious way

… with any necessary whitespace

That’s the entire trick! If you follow those simple rules then the prettyprinter will automatically respect precedence, inserting parentheses in the right places and eliding parentheses when they are not necessary.

Conclusion

There is one major downside to this trick: if you add a new constructor to your syntax tree and you forget to update the pretty-printer then your pretty-printer will infinitely loop. This is pretty annoying as you might imagine.

The main upside to this trick is that pretty-printer logic is very simple to write, so mechanical that you could probably automate it (although I’m not sure if somebody has done so, yet).

Pretty-print syntax trees with this one simple trick