34

Build a JS Interpreter in JavaScript Using Acorn as a Parser

 4 years ago
source link: https://www.tuicool.com/articles/3IbIzaU
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Learn how the JS engine works by building your own JS interpreter — in JS.

iIvYRri.png!web “What I cannot create, I do not understand” — Richard Feynman

In this post, we will learn how the JS engine works by building our own JS interpreter in JS! It’s kinda weird, build a language interpreter using the language, isn’t it? All the same, I did it because you are more conversant with JS. You can translate the JS code we will write here to another language of your choice.

It might not be an absolute must but knowing how your everyday tool works will surely get you much better at using it.

Tip: Share, Reuse and Collaborate on JS modules with Bit

Use Bit to encapsulate modules/components with all their dependencies, tests and setup. Easily share them on Bit’s cloud , collaborate with your team and use them anywhere.

What is an Interpreter?

An interpreter is a language evaluator that runs in runtime executing the program’s source on the fly. It is different from a compiler. Compilers translate a language source code to machine code.

An interpreter parses the source code, generates an AST (Abstract Syntax Tree) from the source, iterates through the ASTs and evaluates them one after the other.

An interpreter has three stages:

  • Tokenization
  • Parsing
  • Evaluating

Tokenization

This is the breakage and organization of the source text into a group of meaningful words, called tokens . In English, when we try to process a sentence like this:

Obi is a boy

We will sub-consciously break the sentence into words:

+--------------------+
| Obi | is | a | boy |
+--------------------+

This is the first stage in analyzing and understanding a sentence. So, in programming, we will break the below source:

let a = 90;
const b = "nnamdi";
const c = a*5;

into tokens

+-------------------------------------------------------------+
| let | a | = | 90 | ; | const | b | = | "nnamdi" | ; | const |
+-------------------------------------------------------------++-----------------------+
| c | = | a | * | 5 | ; |
+-----------------------+

This tokenization is done by the lexer . This token is passed to the parser, with the text broken into tokens, this work is made easier for the parser.

parsing

This is the transformation of the tokens generated by the lexer into Abstract Syntax Tree. AST is the representation of the flow of our code in a tree-like form.

In our English sentence, Obi is a boy was broken down into words:

+--------------------+
| Obi | is | a | boy |
+--------------------+

With this we can pick the words and form a grammar structure:

"Obi": Subject
"is a boy": Predicate
"boy": Object

Obi is a Subject Noun in grammar, the rest is a less meaningful sentence that is called Predicate, the boy is the receiver of the action which is the Object. So the structure goes like this:

Subject(Noun) -> Predicate -> Object

So in our source text, we can transform it in a structure called AST:

let a = 90;
const b = "nnamdi";
const c = a*5;|
    |
    v+-------------------------------------------------------------+
| let | a | = | 90 | ; | const | b | = | "nnamdi" | ; | const |
+-------------------------------------------------------------++-----------------------+
| c | = | a | * | 5 | ; |
+-----------------------+|
    |
    v{
    Program: {
        body: [
                {
                    type: 'VariableDeclaration',
                    declarations:[ 
                        {
                            type: 'VariableDeclarator',
                            id: { 
                                type: 'Identifier', 
                                name: 'a' 
                            },
                            init: { 
                                type: 'Literal', 
                                value: 90, 
                                raw: '90' 
                            }
                        } 
                    ],
                    kind: 'let' 
                },
                {
                    type: 'VariableDeclarator',
                    id: { 
                        type: 'Identifier',
                        name: 'c' 
                    },
                    init: {
                        type: 'BinaryExpression',
                        left: [Node],
                        operator: '*',
                        right: [Node] 
                    } 
                }                 // ...
        ]
    }
}

The above is the structural representation of the source text:

let a = 90;
const b = "nnamdi";
const c = a*5;

evaluating

This stage the interpreter goes through the AST and evaluates each node. If for example, we have this:

5 + 7|
    |
    v+---+  +---+
| 5 |  | 7 |
+---+  +---+
  \     /
   \   /
    \ /
   +---+
   | + |
   +---+{
    rhs: 5,
    op: '+'.
    lhs: 7
}

The interpreter will parse through the ASt, here it will take the LHS node, then it collects the op node there it sees it needs to do an addition, then there must be a second node, it takes the RHS node, it collects there values and performs addition on both nodes and gives out the result, 14 .

We see that each node of an AST denotes a construct occurring in the source code. As we denoted an addition, in a node with three branches, so we can do with the if-else expression:

if(2) {
 //...
} else {
    //...
}|
    |
    v+----+
            | if |
            +----+
              |
              |
            +----+
            |cond|
            +----+
              /\
             /  \
            /    \
           /      \
          /        \
         /          \
      +----+       +----+
      |then|       |else|
      +----+       +----+{
    IfStmnt: {
        condition: 2,
        then: {
            //...
        },
        else: {
            //...
        }
    }
}

Now, we have known the stages of an interpreter, in the next section, we will see the practical application. Though we would not build a parser, we will be using the acorn library. acorn is a tiny, fast JavaScript parser, written completely in JavaScript. Apart from acorn, there is esprima, also fast but not as acorn.

Project Setup

First, we will scaffold a new Node project:

mkdir js-interpreter
cd js-interpreter
npm init -y

We install the acorn library:

npm i acorn

We create an index.js file:

touch index.js

We will import the acorn lib so we can call its parse function, but before that let's create a test folder that will contain a test.js file:

mkdir test
touch test/test.js

Here we will add JS code for testing our interpreter. Let’s just add sample code to the test.js file

// test/test.js
let a = 90;
const b = "nnamdi";
const c = a*5;

Now, we flesh out our index.js file:

// src/index.js
const l = console.log
const acorn = require('acorn')
const fs = require('fs')// pull in the cmd line args
const args = process.argv[2]
const buffer = fs.readFileSync(args).toString()const body = acorn.parse(buffer).body
l(body)

We imported the acorn lib and the filesystem lib fs . Then, we retrieved the user's command, then read the file passed to the command args and converted it to a string buffer. Next, we called the acorn#parse function passing in the file string buffer.

The parse is used to parse a JavaScript program. The input parameter is a string, options can be undefined or an object setting some of the options listed below. The return value will be an abstract syntax tree object as specified by the ESTree spec.

The returned AST object is held in the body variable. The AST object follows the ESTree spec, an organization dedicated to standardizing JS APIs.

ESTree Nodes

This gives us guidelines on how JS parsers should produce AST. It instructs that all nodes be a Node object:

Node {
    type,
    loc
}

The type property denotes the type of node the object represents:

  1. Literal
  2. Identifier
  3. FunctionDeclaration
  4. ClassDeclaration
  5. VariableDeclaration
  6. and so on

The loc holds the position of the Node object in the source text.

A Literal object will be like this:

Node {
    type: "Literal",
    value: string | number | boolean,
    loc
}

The type prop indicates its a Literal, the value prop is the value of a literal either a string, number or a boolean.

A string literal like "nnamdi" would be in AST like this:

Node {
    type: "Literal",
    value: "nnamdi",
    loc
}

A number literal like 100 would be like this:

Node {
    type: "Literal",
    value: 100,
    loc
}

Identifier holds names of variables, functions, and classes that are not keywords.

let a = 89

a is an identifier holds the name of the variable declarator, the above is a VariableDeclarator the id is an Identifier while the init is a Literal. It would be like below in AST:

Node {
    type: 'VariableDeclaration',
    declarations:[ 
        {
            type: 'VariableDeclarator',
            id: { 
                type: 'Identifier', 
                name: 'a' 
            },
            init: { 
                type: 'Literal', 
                value: 89, 
                raw: '89' 
            }
        } 
    ],
    kind: 'let' 
}

The starting source of a source text begins with the Program node type.

Node {
    type: "Program",
    declarations: [
        [Node]
    ]
}

It has a declarations array property, this array contains the AST of the program body. So to execute a script we would reference a Program’s declarations property, loop through it and execute the Nodes.

for(const node of body.declarations) {
    // ...
}

Interpreting Statements and Expressions

What are Statements? What are Expressions? Do both differ?

Yes, both differ. A Statement is an instruction that is executed by the interpreter, An Expression evaluates to a value.

Statements are:

if
while
for
do-while

Expressions are binary expressions like:

  • unary expression
  • comparison expression
  • addition/multiplication/division/subtraction expression

Our test.js file have this:

// test/test.js
let a = 90;
const b = "nnamdi";
const c = a*5;

Looking at the source code, we have VaraibelDeclarations, Literals, and BinaryExpression. We will build our interpreter to support these Nodes.

Our index.js file:

// src/index.js
const l = console.log
const acorn = require('acorn')
const fs = require('fs')// pull in the cmd line args
const args = process.argv[2]
const buffer = fs.readFileSync(args).toString()const body = acorn.parse(buffer).body
l(body)

body variable would hold the Node tree in an array. The above would log:

$ node . test/test.js[ Node {
    type: 'VariableDeclaration',
    start: 0,
    end: 11,
    declarations: [ Node {
    type: 'VariableDeclarator',
    start: 4,
    end: 10,
    id: Node { 
        type: 'Identifier', 
        start: 4, 
        end: 5, 
        name: 'a' },
    init: Node { 
        type: 'Literal', 
        start: 8, 
        end: 10, 
        value: 90, 
        raw: '90' } 
    } ],
    kind: 'let' 
  },
  Node {
    type: 'VariableDeclaration',
    start: 13,
    end: 32,
    declarations: [ Node {
    type: 'VariableDeclarator',
    start: 19,
    end: 31,
    id: Node { 
        type: 'Identifier', 
        start: 19, 
        end: 20, 
        name: 'b' },
    init:
     Node {
       type: 'Literal',
       start: 23,
       end: 31,
       value: 'nnamdi',
       raw: '"nnamdi"' } } ],
    kind: 'const' },  Node {
    type: 'VariableDeclaration',
    start: 34,
    end: 48,
    declarations: [ Node {
    type: 'VariableDeclarator',
    start: 40,
    end: 47,
    id: Node { 
        type: 'Identifier', 
        start: 40, 
        end: 41, 
        name: 'c' },
    init:
     Node {
       type: 'BinaryExpression',
       start: 44,
       end: 47,
       left: Node { 
           type: 'Identifier', 
           start: 44, 
           end: 45, 
           name: 'a' },
       operator: '*',
       right: right: Node { 
           type: 'Literal', 
           start: 46, 
           end: 47, 
           value: 5, 
           raw: '5' } } 
    } ],
    kind: 'const' } ]

See, we have

  • VariableDeclarations
  • Literals
  • BinaryExpression

To interpreter AST we are going to use the Visitor pattern.

Represent an operation to be performed on the elements of an object structure. Visitor lets you define a new operation without changing the classes of the elements on which it operates.

We will create a Visitor class, this will have method that operates on an ES Node:

touch visitor.js

Now we will create an Interpreter that would run the ES Node tree.

touch interpreter.js

We will add an interpret method to Interpreter class

class Interpreter {
    interpret(nodes) {    }
}

Now, Interpreter class should have a Visitor instance, it will use the instance to run the ESTree nodes

class Interpreter {
    constructor(visitor) {
        this.visitor = visitor
    }
    interpret(nodes) {
        return this.visitor.run(nodes)
    }
}

Our Interpreter class is complete. Now, we move over to our Visitor class. First, we will add a run method,

class Visitor {
    visitNodes(nodes) {
        for (const node of nodes) {
            this.visitNode(node)
        }
    }    run(nodes) {
        return this.visitNodes(body)
    }
}

this will loop the node tree and visit each of them. The visitNodes would loop through the nodes and call the visitNode on each of them.

In the visitNode method:

class Visitor {
    ...
    visitNode(node) {
        switch (node.type) {
        }
    }
}

This method would use a switch case to know the type of node node.type and call a method dedicated to the Node. For example, if the node type is a "Literal", we would have visitNode like this

class Visitor {
    ...
    visitNode(node) {
        switch (node.type) {
            case "Literal":
                return this.visitLiteral(node)
        }
    }
}

See, a case test for “Literal”, there a method visitLiteral is called passing the node, this method is a special method that handles only Literals. If we add cases for other Node trees, we will create special methods visitXXX that handles only the node.

Supporting Literals

We implement the visitLiteral :

class Visitor {
    ...    visitLiteral(node) {
        return node.value
    }    visitNode(node) {
        switch (node.type) {
            case "Literal":
                return this.visitLiteral(node)
        }
    }
}

We just return the value of the Node value property, that is where the value of a Literal node is held.

Supporting Identifiers

Let’s support Identifier. Now, we know that an identifier represents variable names, class names, method names and function names.

For this post, we will only be handling identifiers for variables. Classes, methods and function names will be supported in the next post in the series.

So we add an “Identifier” case in the visitNode switch statement.

case 'Identifier':
                return this.visitIdentifier(node)

Then, we create visitIdentifier method,

let globalScope = new Map()
class Visitor {    visitIdentifier(node) {
        const name = node.name
        if (globalScope.get(name))
            return globalScope.get(name)
        else
            return name
    }
}

We created a Map instance globalScope this is where all variables will be stored, with their name as the key. So in the visitIdentifier, we retrieve the name of the identifier, then check if it is already in the globalScope, if defined there we return the value, if not we return the identifier.

Supporting variable declarations

Variable declarations like this

const v, b = 90
let f = 88

are held in VariableDeclaration Node:

VariableDeclaration {
    type: "VariableDeclaration",
    declarations: [VariableDeclarator],
    kind: "var"
}

The VariableDelarations holds info about the kind of the variable, then the declarations holds an array of VariableDeclarators

The VariableDeclarator has Node:

VariableDeclarator {
    type: "VariableDeclarator",
    id: Pattern,
    init: Expression
}

Lets add case for “VariableDeclaration” and “VariableDeclarator”:

visitNode(node) {
        switch (node.type) {
            case 'VariableDeclaration':
                return this.visitVariableDeclaration(node)
            case 'VariableDeclarator':
                return this.visitVariableDeclarator(node)

Then we add methods visitVariableDeclaration and visitVariableDeclarator :

visitVariableDeclaration(node) {
        const nodeKind = node.kind
        return this.visitNodes(node.declarations)
    }    visitVariableDeclarator(node) {
        const id = this.visitNode(node.id)
        const init = this.visitNode(node.init)
        globalScope.set(id, init)
        return init
    }

For now, we are not doing anything with the kind property, later we will use to add support for let , var and const . We called visitNodes passing in the declarations .

In the visitVariableDeclarator , we know the id holds the name of the variable and init holds the value or expression of the id . After visiting the nodes, we use the id as key to set the value of the init in the globalScope Map instance.

Supporting Calls

Now we will be supporting calls. The only call will be supporting is print . This call would print its arguments to the terminal.

calls are represented in CallExpression node:

CallExpression {
    type: "CallExpression",
    callee: Expression,
    arguments: [Expression]
}

obviously, the type holds the type of expression, callee holds the name of the function and the arguments holds the arguments in an array.

So

print(90)

will be:

CallExpression {
    type: "CallExpression",
    callee: {
        name: "print"
    },
    arguments: [
        {
            raw: 90
        }
    ]
}

in EStree.

To interpret this: we will add a CallExpression case in the visitNode switch case:

case "CallExpression":
                return this.visitCallExpression(node)

Then a visitCallExpression.

evalArgs(nodeArgs) {
        let g = []
        for (const nodeArg of nodeArgs) {
            g.push(this.visitNode(nodeArg))
        }
        return g
    }    visitCallExpression(node) {
        const callee = this.visitIdentifier(node.callee)
        const _arguments = this.evalArgs(node.arguments)        if (callee == "print")
            console.log(..._arguments)
    }

We got the name of the function, then we gathered the argument’s value in an array _arguments , the evalArguments function did that. For now, we just checked the function name callee is print, if yes, we use console.log to print the _agruments. That is it for supporting calls for now.

Supporting Binary expressions

Binary expressions are operations like addition, subtraction, division, multiplication, etc.

The ESNode for binaryexpressions is:

BinaryExpression {
    type: "BinaryExpression",
    operator: BinaryOperator,
    left: Expression,
    right: Expression
}

so

9 * 8

will be represented in BinaryExpression node like this:

BinaryExpression {
    type: "BinaryExpression",
    operator: "*",
    left: {
        raw: 9
    },
    right: {
        raw: 8
    }
}

We will add a case type for “BinaryExpression” in visitNode:

case 'BinaryExpression':
                return this.visitBinaryExpression(node)

Then we add the visitBinaryExpression:

visitBinaryExpression(node) {
        const leftNode = this.visitNode(node.left)
        const operator = node.operator
        const rightNode = this.visitNode(node.right)        switch (operator) {
            case ops.ADD:
                return leftNode + rightNode
            case ops.SUB:
                return leftNode - rightNode
            case ops.DIV:
                return leftNode / rightNode
            case ops.MUL:
                return leftNode * rightNode
        }
    }

The left and right properties are Expressions so we called the visitNode on them. We used a switch case on the operator type, see cases for addition, subtraction, division, and multiplication.

For addition, we applied the + on the left node and the right node. Same with other operator types, then the result of the operation is returned.

Testing

Now, we are going to test our interpreter. We need to add print calls in our test.js file to let us see how the execution goes:

// test/test.js
let a = 90;
const b = "nnamdi";
const c = a*5;print(a)
print(b)
print(c)

Then in the index.js file:

// src/index.js
const l = console.log
const acorn = require('acorn')
const Interpreter = require("./interpreter.js")
const Visitor = require("./visitor.js")
const fs = require('fs')// pull in the cmd line args
const args = process.argv[2]
const buffer = fs.readFileSync(args).toString()
const jsInterpreter = new Interpreter(new Visitor())const body = acorn.parse(buffer).bodyjsInterpreter.interpret(body)

interpreter.js :

class Interpreter {
    constructor(visitor) {
        this.visitor = visitor
    }
    interpret(nodes) {
        return this.visitor.run(nodes)
    }
}
module.exports = Interpreter

visitor.js :

const l = console.logconst ops = {
    ADD: '+',
    SUB: '-',
    MUL: '*',
    DIV: '/'
}
let globalScope = new Map()class Visitor {    visitVariableDeclaration(node) {
        const nodeKind = node.kind
        return this.visitNodes(node.declarations)
    }    visitVariableDeclarator(node) {
        const id = this.visitNode(node.id)
        const init = this.visitNode(node.init)
        globalScope.set(id, init)
        return init
    }    visitIdentifier(node) {
        const name = node.name
        if (globalScope.get(name))
            return globalScope.get(name)
        else
            return name
    }    visitLiteral(node) {
        return node.raw
    }    visitBinaryExpression(node) {
        const leftNode = this.visitNode(node.left)
        const operator = node.operator
        const rightNode = this.visitNode(node.right)        switch (operator) {
            case ops.ADD:
                return leftNode + rightNode
            case ops.SUB:
                return leftNode - rightNode
            case ops.DIV:
                return leftNode / rightNode
            case ops.MUL:
                return leftNode * rightNode
        }
    }    evalArgs(nodeArgs) {
        let g = []
        for (const nodeArg of nodeArgs) {
            g.push(this.visitNode(nodeArg))
        }
        return g
    }    visitCallExpression(node) {
        const callee = this.visitIdentifier(node.callee)
        const _arguments = this.evalArgs(node.arguments)        if (callee == "print")
            l(..._arguments)
    }    visitNodes(nodes) {
        for (const node of nodes) {
            this.visitNode(node)
        }
    }    visitNode(node) {
        switch (node.type) {
            case 'VariableDeclaration':
                return this.visitVariableDeclaration(node)
            case 'VariableDeclarator':
                return this.visitVariableDeclarator(node)
            case 'Literal':
                return this.visitLiteral(node)
            case 'Identifier':
                return this.visitIdentifier(node)
            case 'BinaryExpression':
                return this.visitBinaryExpression(node)
            case "CallExpression":
                return this.visitCallExpression(node)
        }
    }    run(nodes) {
        return this.visitNodes(nodes)
    }
}module.exports = Visitor

Now let’s run the interpreter:

$ node . test/test.js90
"nnamdi"
450

Boom!! we just interpreted JS code !!

Conclusion

This is just the tip of the iceberg. There are so many features in JS to interpret, we just did the basics here.

We will cover the topics in the next post in the series,

  • we will support the remaining operators: -=, +=, ==, ===, !=, !==, <, >, <=, >=, <<, >>, >>>, %, |, ^, &, in, instanceof.
  • Add support for functions, classes, builtin functions and objects.
  • We will add support for for-of , for-in , for statements.

Till next time.

If you have any question regarding this or anything I should add, correct or remove, feel free to comment, email or DM me

Thanks !!!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK