32

What is an Abstract Syntax Tree

 4 years ago
source link: https://www.tuicool.com/articles/BJvY7rF
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

A ST is an acronym for Abstract Syntax Tree. It is a representation of tokens generated from statements and expressions in a programming language. With the AST, the interpreter or the compiler can generate machine code or evaluate an instruction.

Tip: Using Bit you can turn any JS code into an API you can share, use and sync across projects and apps to build faster and reuse more code. Give it a try.

If we have a simple expression like this:

1 + 2

This could be represented in AST like this:

+ BinaryExpression
 - type: +
 - left_value: 
  LiteralExpr:
   value: 1
 - right_vaue:
  LiteralExpr:
   value: 2

Statements like an if statement could be represented like this:

if(2 > 6) {
    var d  = 90
    console.log(d)
}
IfStatement
 - condition
  + BinaryExpression
   - type: >
   - left_value: 2
   - right_value: 6
 - body
  [
    - Assign
        - left: 'd';
        - right: 
            LiteralExpr:
            - value: 90
    - MethodCall:
         - instanceName: console
         - methodName: log
         - args: [
]
  ]

This tells the interpreter how to interpret the statements and tells the compiler how to generate the code for the statements.

Look at the expression: 1 + 2 our brain registers that this is an addition operation, that we are going to add the left value to the right value. Now, to make the computer work the same way we do, we’ll have to represent it in a form that resembles how our brain sees it.

We represented in a class, with properties that tell the interpreter what the operation is all about, the left value and the right value, we call the class Binary because a binary operation involves two values:

class Binary {
    constructor(left, operator, right) {
        this.left = left
        this.operator = operator
        this.right = right
    }
}

During instantiation, we will pass 1 in the left property, 'ADD' in the operator property and 2 in the right property:

new Binary('1', 'ADD', '2')

When we pass this to the interpreter, the interpreter sees it is a binary operation, then it checks the operator, sees it as an Addition operation, then it goes forward to request the left value and the right value from the Binary instance and add the two together:

const binExpr = new Binary('1', 'ADD', '2')
if(binExpr.operator == 'ADD') {
    return binExpr.left + binExpr.right
}
// `3` is returned

See, AST makes it possible to execute expressions and statements just like how the brain does.

Individual numbers, strings, Booleans, etc are expressions and they are represented in AST and evaluated to get their values.

23343
false
true
"nnamdi"

For example, 1 like this:

is represented in AST in a Literal class, a literal is a single word or number that stands alone. The class Literal will have a property that holds the word:

class Literal {
    constructor(value) {
        this.value = value
    }
}

We can represent the 1 in class Literal like this:

new Literal(1)

When being evaluated by an interpreter, it requests for the value of the value property in the Literal instance:

const oneLit = new Literal(1)
oneLit.value
// `1`

In our Binary expression, we passed in the values direct

new Binary('1', 'ADD', '2')

This shouldn’t be so because 1 and 2 as we saw above is an expression, a primary expression. They are also to be evaluated because they are literals and will be represented by the Literal class.

const oneLit = new Literal('1')
const twoLit = new Literal('2')

So the Binary expression would take in oneLit and twoLit as left and right properties.

// ...
new Binary(oneLit, 'ADD', twoLit)

On evaluation, the left and right properties would be evalueted also to get theri values:

const oneLit = new Literal('1')
const twoLit = new Literal('2')
const binExpr = new Binary(oneLit, 'ADD', twoLit)
if(binExpr.operator == 'ADD') {
    return binExpr.left.value + binExpr.right.value
}
// `3` is returned

Other statements are represented much the Binary in AST, for example, the if statement.

We know that in if statement comes with a condition that must resolve to true before the body of the if statement is executed.

if(9 > 7) {
    log('Yay!!')
}

The above if statement has a condition that 9 must be greater than 7 before the body is executed, we see "Yay!!" in our terminal.

To make an interpreter or compiler executes this, we will represent this in a class that has properties condition , body . the condition holds the condition that must resolve to true, body is an array that will contain the statements in the body of the if statement, the interpreter will loop thru the array and execute the statements there.

class IfStmt {
    constructor(condition, body) {
        this.condition = condition
        this.body = body
    }
}

Now let’s represent the below in the IfStmt class

if(9 > 7) {
    log('Yay!!')
}

The condition is a Binary operation, which will be represented like this:

const cond = new Binary(new Literal(9), "GREATER", new Literal(7))

like we did before, hope you remember? This time a GREATER operation.

The body of the if statement has only one statement: a function call. Function calls can be represented in a class with properties: name, denoting the name of the function to call and args the arguments passed to the function:

class FuncCall {
    constructor(name, args) {
        this.name = name
        this.args = args
    }
}

So the log(“Yay!!”) call will be like this:

const logFuncCall = new FuncCall('log', [])

Now putting everything together, our if statement will be represented like this:

const cond = new Binary(new Literal(9), "GREATER", new Literal(7));
const logFuncCall = new FuncCall('log', []);
const ifStmt = new IfStmt(cond, [
    logFuncCall
])

Interpreter can interpret the if statement like this:

const ifStmt = new IfStmt(cond, [
    logFuncCall
])
function interpretIfStatement(ifStmt) {
    if(evalExpr(ifStmt.conditon)) {
        for(const stmt of ifStmt.body) {
            evalStmt(stmt)
        }
    }
}
interpretIfStatement(ifStmt)

Output:

Yay!!

Since 9 > 7 :)

We interpreted the if statement by checking if the condition resolves to true if then we loop thru the body array and executes the statements in there.

Executing the AST

AST is evaluated using the Visitor pattern, the visitor pattern is a design pattern that allows an algorithm of a group of objects to be implemented in one place.

The ASts, Literal, Binary, IfStmnt are a related group of classes each of them need to carry methods that will enable the interpreter to get their values or evaluate them.

Visitor pattern enables us to create a single class and write the implementation of the ASTs in the class and feed the class to the ASTs. Each of the AST will have a common method that the interpreter will call with the implementing class instance, then the AST class would each call the corresponding method in the passed in implementing class that evaluates its AST.

class Literal {
    constructor(value) {
        this.value = value
    }
visit(visitor) {
        return visitor.visitLiteral(this)
    }
}
class Binary {
    constructor(left, operator, right) {
        this.left = left
        this.operator = operator
        this.right = right
    }
visit(visitor) {
        return visitor.visitBinary(this)
    }
}

See, the ASTs Literal and Binary have both the visit method but inside the method, they call the methods in the visitor instance that evaluates them, Literal calls visitLiteral and Binary calls visitBinary .

Now, let’s take the implementing class to be Visitor, it would implement the methods visitLiteral and visitBinary:

class Visitor {
visitBinary(binExpr) {
        // ...
        log('not yet implemented')
    }
visitLiteral(litExpr) {
        // ...
        log('not yet implemented')
    }
}

The visitBinary and visitLiteral will have their implementations there in the Visitor class. So when an interpreter wants to interpret a Binary expression it would call the visit method of the Binary expr passing in the instance of the Visitor class:

const binExpr = new Binary(...)
const visitor = new Visitor()
binExpr.visit(visitor)

The visit method would call the visitBinary of visitor passing in itself to the method, then the not yet implemented would be printed. This is called double dispatch.

  1. The visit method of the Binary was called.
  2. It ( Binary ) in turn called the visitBinary of the Visitor instance.

Let’s write the full code for the visitLiteral. We know that the Literal instance holds the value in the value property, so we simply return the value:

class Visitor {
visitBinary(binExpr) {
        // ...
        log('not yet implemented')
    }
visitLiteral(litExpr) {
        return litExpr.value
    }
}

For visitBinary, we know the Binary class has operator, left and right props. The operator holds the kind of operation to be performed on the left and right. We can write the implementation like this:

class Visitor {
visitBinary(binExpr) {
        switch(binExpr.operator) {
            case 'ADD':
            // ...
        }
    }
visitLiteral(litExpr) {
        return litExpr.value
    }
}

Note the left and right values are expressions they might be a Literal expression or a Binary expression or a Call expression or any other expression. We can’t assume that left and right of binary operations are always literals. Each expression must have a visit method that we use to evaluate the expression so in the visitBinary method above we will evaluate the left prop of Binary and the right prop by calling their visit method to get their values:

class Visitor {
visitBinary(binExpr) {
        switch(binExpr.operator) {
            case 'ADD':
                return binExpr.left.visit(this) + binExpr.right.visit(this)
        }
    }
visitLiteral(litExpr) {
        return litExpr.value
    }
}

So this will pass no matter the kind of expression the left and right values hold:

So if we have this:

const oneLit = new Literal('1')
const twoLit = new Literal('2')
const binExpr = new Binary(oneLit, 'ADD', twoLit)
const visitor = new Visitor()
binExpr.visit(visitor)

The binary operation, in this case, holds literals

The visitBinary of the visitor will be called passing in binExpr. The visitBinary in the Visitor will have left as oneLit and right as twoLit. oneLit is an instance of Literal and twoLit is an instance of Literal. So their visit methods are called passing in the Visitor class. Inside the Literal class for oneLit the visitLiteral method of Visitor is called passing in oneLit , the visitLiteral in Visitor returns the value property of the Literal class which is 1 . Same goes for twoLit which returns 2 .

The returned values are added together because the case 'ADD' of the switch statement is executed and 3 is returned.

If we pass in binExpr.visit(visitor) to console.log it will print 3

console.log(binExpr.visit(visitor))
// 3

Let’s pass binary operation that has 3 branches like this:

1 + 2 + 3

First, we pick 1 + 2 , the result will form the left value for + 3 .

Representing the above in Binary will like this:

new Binary (new Literal(1), 'ADD', new Binary(new Literal(2), 'ADD', new Literal(3)))

See the right value isn’t a literal but a Binary. So before it can carry out its add operation it must evaluate the Binary expr the result will be used as the right value for its evaluation.

const oneLit = new Literal(1)
const threeLit =new Literal(3)
const twoLit = new Literal(2)
const binExpr2 = new Binary(twoLit, 'ADD', threeLit)
const binExpr1 = new Binary (oneLit, 'ADD', binExpr2)
const visitor = new Visitor()
log(binExpr1.visit(visitor))

Adding if statement

Bringing if statement to the equation. To evaluate an if statement, we will add a visit method to IfStmt class, it would call visitIfStmt method:

class IfStmt {
    constructor(condition, body) {
        this.condition = condition
        this.body = body
    }
visit(visitor) {
        return visitor.visitIfStmt(this)
    }
}

See the power of the Visitor pattern? we added a new class to the group and we simply added the same visit method to call its corresponding method in the Visitor class. This will cause no breakage or touching of the other related classes, the Visitor pattern makes us obey the Closed and Open principle.

So we implement visitIfStmt in the Visitor class:

class Visitor {
    // ...
visitIfStmt(ifStmt) {
        if(ifStmt.condition.visit(this)) {
            for(const stmt of ifStmt.body) {
                stmt.visit(this)
            }
        }
    }
}

We evaluate the condition because it is an expression, so we call its visit method to evaluate it. We use the if statement in JS to check the returned value is truthy if truthy, the body of statements ifStmt.body is looped and each statement in the array is evaluated by calling its visit method and passing in the Visitor.

So we can translate this:

if(67 > 90)

Adding function call and function declaration

Let’s add a function call. We have an already class for function call:

class FuncCall {
    constructor(name, args) {
        this.name = name
        this.args = args
    }
}

Let’s add a visit method:

class FuncCall {
    constructor(name, args) {
        this.name = name
        this.args = args
    }
visit(visitor) {
        return visitor.visitFuncCall(this)
    }
}

Let’s add visitFuncCall to Visitor class:

class Visitor {
    // ...
visitFuncCall(funcCall) {
        const funcName = funcCall.name
        const args = []
        for(const expr of funcCall.args)
            args.push(expr.visit(this))
        // ...
    }
}

We have a problem here, there are in-built functions and user-defined functions. We need to create a holder for user-defined where function declarations will be stored and referenced by their name.

const FuncStore = (
    class FuncStore {
constructor() {
            this.map = new Map()
        }
setFunc(name, body) {
            this.map.set(name, body)
        }
getFunc(name) {
            return this.map.get(name)
        }
    }
    return new FuncStore()
)()

This FuncStore stores functions and retrieve them from a Map instance.

class Visitor {
    // ...
visitFuncCall(funcCall) {
        const funcName = funcCall.name
        const args = []
        for(const expr of funcCall.args)
            args.push(expr.visit(this))
        if(funcName == "log")
            console.log(...args)
        if(FuncStore.getFunc(funcName))
            FuncStore.getFunc(funcName).forEach(stmt => stmt.visit(this))
    }
}

See what we did, if the function name funcName (Remember the FuncCall class stores the name of the function in the name property) is log we run the JS console.log(...) and pass the arguments to it. If we find the function in the function store we loop through the body and visit each of them, that executes the function body.

Now let’s see how to put our function declaration to the function store.

Function declarations starts with the word fucntion . The general function structure is this:

function function_name(params) {
    // function body
}

So we can represent a function declaration in a class with properties: name, which holds the name of the function and body, an array that holds the statements in the function body:

class FunctionDeclaration {
    constructor(name, body) {
        this.name = name
        this.body = body
    }
}

We add a visit method that will call visitFunctionDeclaration in the Visitor:

class FunctionDeclaration {
    constructor(name, body) {
        this.name = name
        this.body = body
    }
visit(visitor) {
        return visitor.visitFunctionDeclaration(this)
    }
}

In Visitor:

class Visitor {
    // ...
visitFunctionDeclaration(funcDecl) {
        FuncStore.setFunc(funcDecl.name, funcDecl.body)
    }
}

We simply store the function using its name as the key.

Now if we have a function like this:

function addNumbers(a, b) {
    log(a + b)
}
function logNumbers() {
    log(5)
    log(6)
}

It will be represented like this:

const funcDecl = new FunctionDeclaration('logNumbers', [
    new FuncCall('log', [new Literal(5)]),
    new FuncCall('log', [new Literal(6)])
])
visitor.visitFunctionDeclaration(funcDecl)

Now lets call our function logNumbers :

const funcCall = new FuncCall('logNumbers', [])
visitor.visitFuncCall(funcCall)

We will have this in our console:

Conclusion

Understanding ASTs can be very daunting and brain-tasking. To write the simplest of parsers requires a lot of line of code.

Note, we didn’t cover scanners and parsers, we jumped the gun to ASTs to show you how they work. If you can deeply understand AST and what it entails, you will be half done when you start writing your own programming languages.

Practice makes perfect, go ahead and add other programming language features like:

for-of
while
for-in

If you have any questions regarding this or anything I should add, correct or remove, feel free to comment, email or DM me.

Thanks !!!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK