17

Architecture for a JS to C compiler

 5 years ago
source link: https://www.tuicool.com/articles/hit/eYFryiB
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

The basic data-flow of compilation looks like this:

  1. parse: turn the source text into something we can use as input for our compiler, usually an AST (abstract syntax tree)
  2. optimisation (optional): transform the input program into an equivalent one that's more efficient in time or space
  3. code generation: output a program in our target language, e.g machine code

My JS to C compiler will implement this as follows

  1. parse: use esprima - this project is about compilation not parsing
  2. & 3. code generation + optimisation: a compiler written in TypeScript that outputs C

C as a target language

C is lower-level than JavaScript, but it still gives us a lot! We can use the following features in C to implement similar features in JavaScript:

  1. control structures: if , for , while etc will map well between languages
  2. function calls: we can compile JS functions to C functions

However, there is lots we're going to have to write from scratch. For instance C lacks data-structures beyond statically sized arrays. We'll fill this in with what's called a 'runtime library' - a library we'll call from our compiled program. Lots of the behaviour we'll need to implement JavaScript cannot be implemented without runtime support:

  1. garbage collection
  2. exception handling
  3. data-structures: dynamic arrays and objects
  4. prototype-based object system

A sketch

Lets manually walk the following JS code through our compiler tool-chain:

function fact(n) {
    return n < 3 ? n : n * fact(n - 1);
}
console.log(fact(5));

First we run esprima to parse the source and give us a syntax-tree:

{
  "type": "Program",
  "body": [
      {
          "type": "FunctionDeclaration",
          "id": {
              "type": "Identifier",
              "name": "fact"
          },
          "params": [
              {
                  "type": "Identifier",
                  "name": "n"
              }
          ],
          "body": {
              "type": "BlockStatement",

This will be input to js-to-c.ts , which would output a target C program. We have a ternary expression in our input program, here's a snippet from the function in the compiler that handles it. You can see we're implementing JS's ternary expression with C's if :

function compileConditionalExpression(node: ConditionalExpression, state: CompileTimeState) {
    // ...
    return `${testSrc};
            JsValue* ${resultTarget.id};
            if(isTruthy(${testResultTarget.id})) {
                ${consequentSrc}
            } else {
                ${alternateSrc}
            }`
}

Once our compiler has run we're left with a valid C program (hopefully). The below is the compiled fact function. Some interesting things to look for: the recursive call, and how the result value from the ternary is threaded through the multiplication and return:

static JsValue *fact_1(Env *env) {
  JsValue *return_3;

  JsValue *left_6 = envGet(env, interned_2 /* n */);
  JsValue *right_7 = jsValueCreateNumber(3);
  JsValue *conditionalPredicate_5 = LTOperator(left_6, right_7);
  JsValue *conditionalValue_4;
  if (isTruthy(conditionalPredicate_5)) {
    return_3 = envGet(env, interned_2 /* n */);
  } else {
    JsValue *left_8 = envGet(env, interned_2 /* n */);
    JsValue *callee_10 = envGet(env, interned_11 /* fact */);
    JsValue *left_12 = envGet(env, interned_2 /* n */);
    JsValue *right_13 = jsValueCreateNumber(1);
    JsValue *call10Arg_0 = subtractOperator(left_12, right_13);
    JsValue *args_10[] = {call10Arg_0};
    JsValue *right_9 = functionRunWithArguments(callee_10, args_10, 1, NULL);

    return_3 = multiplyOperator(left_8, right_9);
  }
  return return_3;
  return getUndefined();
}

You can also see calls out to our runtime library - e.g envGet() . This is part of the runtime as we can't rely on the C call-stack here. In JS, functions are closures. Therefore they need the environment they close over to live well beyond the lifetime of the call-stack that defined them - think of function values stored as global properties or as event-listeners.

To use the runtime library our output program use C's #include macro to include our runtime library's header files, and get access to the definitions of the functions our library has defined elsewhere:

#include "../../runtime/environments.h"
#include "../../runtime/exceptions.h"
#include "../../runtime/functions.h"

In my next post, we'll look at our how our js-to-c.ts compiler takes a syntax tree and outputs C source code.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK