Pragmatic compiling of C++ to WebAssembly. A Guide

Most C++-Programmers I know of have already heard of WebAssembly, but most have had troubles getting started. This guide brings you beyond of a simple “Hello World”: To a stateful-applications with interactions between C++ and JavaScript.

I found not a single article out there, that handled more than just the bare minimum. It took much effort to come from the simplest “Hello World” to a system, that can solve actual real-world problems. This post delivers that.

Yes, there is a GitHub-Repo with the final code, but it makes much more sense to follow this tutorial . https://github.com/tom-010/webassembly-example

Note: This article does not show, how to pass arrays from JS to WebAssembly, butthis article does.

eYZbAz2.jpg!web

This article is no intro to WebAssembly itself or why you should use it, so there is no big speech of motivation in the beginning. Nevertheless here the definition from https://webassembly.org/ :

WebAssembly (abbreviated Wasm ) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.

The homepages list four significant reasons to use it: Efficient and fast, Safe Open and debuggable, and Part of the open web platform

However, be honest. The only reason is number one: “Efficient and fast.” Anything else bring is achieved with JavaScript as well.

So, let’s get started!

The Operating System

I am using Ubuntu 18.10; just with the standard build tools for C++:

$ apt install build-essential cmake python git

All tooling supports Windows and MacOS as well.

Compile the Toolchain

First of all, we need the toolchain (based on Clang). The best place to start is the Getting Started Guide (here are the steps for the other operating systems as well).

$ mkdir ~/tmp && cd ~/tmp
$ git clone https://github.com/juj/emsdk.git$ cd emsdk
$ ./emsdk install --build=Release sdk-incoming-64bit binaryen-master-64bit
$ ./emsdk activate --build=Release sdk-incoming-64bit binaryen-master-64bit

This takes a while (between others Clang gets compiled) and requires some disk space.

Feel free to use the time to play around with WebAssembly in the “ WebAssembly Explorer ” (like Compiler Explorer).

First Compilation: “Hello World”

It is time to compile “Hello World”:

$ cat hello.cpp#include<iostream>int main() {
  std::cout << "Hello World" << std::endl;
  return 0;
}

Before compiling, we have to source our compiled toolchain. Run the following command in the directory which you cloned before:

$ source ./emsdk_env.sh --build=Release

It is even better to append the following line to you ‘.bashrc’ via (run in the also in the cloned folder!):

$ echo "source $(pwd)/emsdk_env.sh --build=Release > /dev/null" >> ~/.bashrc

Compile it

$ em++ hello.cpp -s WASM=1 -o hello.html

The result is a hello.html and more important the hello.wasm file. The latter contains the compiled code.

Run The Code

If you open the index.html in the browser, you get CORS-Problems . You have to serve them via a web server. EmScripten brings this with it:

emrun --port 8080 .

This starts a web server, opens a browser and navigates in the current directory. Just click the newly created hello.html. And voilà:

yuEfa2a.png!web

emscripten result

Emscripten provides a console automatically and executes your program.

Call By Yourself (JavaScript)

The console in the web browser is nice but not very useful in production. Let’s write a minimal script to call our “Hello World.”

Emscriptes has created a hello.html and a hello.js as well. The HTML-File is entirely bloated and contains nothing particularly useful for further usage.

The ‘hello.js’ is very helpful. It loads and instantiates our WebAssembly code and provides a JavaScript interface to it. Therefore we keep it and replace the HTML-File by:

$ cat index.html<html>
  <body>
    <script src="hello.js"></script>
  </body>
</html>

Refresh the browser and open the web-console (Shift+Strg+I and then the ‘console’ tab). And here we have:

“Hello World” in the web-console

I can’t get the code more minimal as this HTML, so I am at a good point to some more complex stuff. Most tutorials end here , but no project has a single file!

Two Or More Files

Just throw-away-code (Fibonacci numbers):

$ cat fib.cppint fib(int x) {
  if (x < 1)
    return 0;
  if (x == 1)
    return 1;
  return fib(x-1)+fib(x-2);
}
$ cat hello.cpp#include<iostream>
#include "fib.cpp"
int main() {
 std::cout << "fib(5) = " << fib(5) << std::endl;
 return 0;
}

Compiled by:

$ em++ hello.cpp -s WASM=1

My web-server is still running, so after a refresh:

b67JFzb.png!web

fib(5) = 5

The good message: It works. The bad: It has overwritten my hello.html. The trick is to specify ‘hello.js’ as output instead of ‘hello.html’.

$ em++ hello.cpp -s WASM=1 -o hello.js

This regenerates just the ‘hello.wasm’ and the ‘hello.js’ but not the ‘hello.html’. To bring a little automation, a build-script together with an appropriate folder-structure:

$ cat build.shrm build/ -rf
mkdir build
cd build
em++ ../cpp/hello.cpp -s WASM=1 -o hello.js
mv hello.js ../web/gen/
mv hello.wasm ../web/gen/

In the newly created directory:

$ tree ..
├── build
├── build.sh
├── cpp
│   ├── fib.cpp
│   └── hello.cpp
├── serve.sh
└── web
    ├── gen
    │   ├── hello.js
    │   └── hello.wasm
    └── index.html

With a little run-script for typing-convenience:

$ cat serve.shemrun --port 8080 web/

And an adjusted index.html:

$ cat web/index.html<html>
  <body>
    <script src="gen/hello.js"></script>
  </body>
</html>

Nice. Now we can develop the web app independently and have the generated sources in an extra folder so that it becomes easy to remember, that you shouldn’t modify them (as it is with any generated source).

Header Files

To claim that the previous example has multiple files is cheating (no headers, etc.). So we split the fib into header and implementation:

$ cat fib.h#ifndef FIB
#define FIB
int fib(int x);#endif$ cat fib.cpp#include "fib.h"int fib(int x) {
  if (x < 1)
    return 0;
  if (x == 1)
    return 1;
  return fib(x-1)+fib(x-2);
}
$cat hello.cpp#include<iostream>
#include "
fib.h"int main() {
  std::cout << "fib(6) = " << fib(6) << std::endl;
  return 0;
}

Note that hello.cpp does not include fib.cpp, but only the header. Therefore a linking process has to happen. This is the reason, why the build fails:

$ ./build.sherror: undefined symbol: _Z3fibi
warning: To disable errors for undefined symbols use `-s ERROR_ON_UNDEFINED_SYMBOLS=0`
Error: Aborting compilation due to previous errors
shared:ERROR: '/home/thomas/tmp/emsdk/node/8.9.1_64bit/bin/node /home/thomas/tmp/emsdk/emscripten/incoming/src/compiler.js /tmp/tmpDR9qjf.txt /home/thomas/tmp/emsdk/emscripten/incoming/src/library_pthread_stub.js' failed (1)

Adding the fib.cpp to the build-script fixes the problem:

$ cat build.shrm build/ -rf
mkdir build
cd build
em++ ../cpp/hello.cpp ../cpp/fib.cpp -s WASM=1 -o hello.js || exit 1
mv hello.js ../web/gen/
mv hello.wasm ../web/gen/

Note: ‘|| exit 1’ causes the script to stop if the build fails!

$ ./build.sh
$ ./serve.sh

The build passes now. Please note, that I changed the parameter for fib to 6:

int main() {
 std::cout << "fib(6) = " << fib(6) << std::endl;
 return 0;
}

So we can see an actual difference now:

Compiling multiple files works! As long you are able to extend the simple build-script, this approach is fine. We go into Build-Systems (CMake) later for more complex projects. But let’s go into argument-passing first!

Disassembling

Sometimes (as in the next section) it is useful to disassemble your code into S-expression. You can do this with

$ wasm-dis hello.wasm -o hello.wast

Here you can find functions globals and so on easily. S-expressions are the textual representation of the WebAssembly. To understand the building-blocks, check out the great guide by MDN .

Disassembling becomes particularly useful when the C++-code was compiled with the flag ‘eemc … -s ONLY_MY_CODE=1 …’. Then, the result is only a few lines long and you can analyze it carefully.

wast-file as part of the build

As C. Gerard Gallant suggested in the comments, emscripten can also generate the wast-file directly while compiling the wasm with the flag ‘-g’.

$ cat build.shrm build/ -rf
mkdir build
cd build
em++ ../cpp/hello.cpp ../cpp/fib.cpp 
-g -s WASM=1 -o hello.js || exit 1
mv hello.js ../web/gen/
mv hello.wasm ../web/gen/

You can now find the always-up-to-date ‘wast’-file in the build folder (build/hello.wast).

Function Calls & Passing Arguments

That the console shows “fib(6) = 8” comes from the cout in the main of the C++-program, that is executed after loading. Now, I want to call fib from JavaScript:

$ cat index.html
<html>
  <body>
    <script src="gen/hello.js"></script>
    <script>
     console.log( 
fib(10) );
    </script>
  </body>
</html>

Now, I am facing two problems:

The Program is not loaded, when I want to execute fib(10)
The function fib is not exported from C++ by eemc and therefore not available to JS

Export functions from C++

Note: To get all exported functions, you can decompile the wasm-file via ‘wasm-dis hello.wasm -o hello.wast’ and search in the wast-file for “ (export”. Because of C++ the function names are prefixed. The file is written in S-expression .

Without modification, only ‘main’ is exported. We have to change the build-script:

$ cat build.shrm build/ -rf
mkdir build
cd build
em++ ../cpp/hello.cpp ../cpp/fib.cpp -s WASM=1 
-s EXPORT_ALL=1 -o hello.js || exit 1
mv hello.js ../web/gen/
mv hello.wasm ../web/gen/

With ‘-s EXPORT_ALL=1’ the fib-function gets exported as well but with the modified name (by C++) ‘__Z3fibi’. I found the name via looking the decompiled code (no fear — it is easy).

Control, what to compile

As you may saw, is multiple KB big (9.6 in my case). Just for a very simple algorithm. This comes from the ‘iostream’ module. When you remove the import and the call and set the flag in the build.sh, that only our code should appear in the resulting wasm-file the file gets much handier:

$ cat build.shrm build/ -rf
mkdir build
cd build
em++ ../cpp/hello.cpp ../cpp/fib.cpp -s WASM=1 -s 
ONLY_MY_CODE=1 -s EXPORT_ALL=1 -O3  -o hello.js || exit 1
mv hello.js ../web/gen/
mv hello.wasm ../web/gen/
$ cat hell.cpp// #include<iostream>#include "fib.h"
int main() {
  fib(10);
  return 0;
}

Now the file is only 96 Bytes big. This is more appropriate. With this, we can call our C++-Function:

$ cat hello.html<html>
<body>
<script>
function loadWasm(fileName) { 
  return fetch(fileName)
    .then(response => response.arrayBuffer())
    .then(bits => WebAssembly.compile(bits))
    .then(module => { return new WebAssembly.Instance(module) });
};

loadWasm('gen/hello.wasm')
  .then(instance => {
    let fib = instance.exports.__Z3fibi;
    console.log(
fib(1));
    console.log(
fib(20));
  });

</script>
</body>
</html>

Note, that we don’t include ‘gen/hello.js’ anymore. Here we do the minimal work to load the wasm ourself. In the first block, we load, compile and instantiate the program, in the second we run it. Not that complicated. Later in the article, I will handle the memory ideas but for now, this works.

Nice. The first C++ code called by JavaScript

Load more than 4 kB

Imagine, our code gets bigger. I simulate this with including ‘iostream’ again and remove the corresponding flag from the build-script:

$ cat build.sh...
em++ ../cpp/hello.cpp ../cpp/fib.cpp -s WASM=1 -s EXPORT_ALL=1 -O3  ...
$ cat hello.cpp#include<iostream>
#include "fib.h"
int main() {
  std::cout << fib(10) << std::endl;
  return 0;
}

We are back at 165,4 kB file-size of the ‘hello.wasm’. This is realistic enough for simulating some lines of code. We didn’t change anything about the binding or the algorithm, so it should work:

An error occurs:

RangeError: WebAssembly.Instance is disallowed on the main thread, if the buffer size is larger than 4KB

This makes sense. We do not want to block our main-thread with loading, compiling and loading a WebAssembly file.

Google provides good docs , how to handle WebAssembly efficiently.

In fact, from now on, it gets very nasty, because, we would define stuff for every exported function. Otherwise, we get many many wired errors. Defining all the bindings for some functions would be okay, but remember, we exported all functions which include all of ‘iostream’. I tried it some hours and realized that the generated code by EmScripten is the easiest way for now. Therefore:

$ cat index.html<html>
<body>
<script src="gen/hello.js"></script>
<script>
Module.onRuntimeInitialized = function() {
  console.log(Module.__Z3fibi(30));
}
</script>
</body>
</html>

I still use the prefixed name of the fib-function. I register my function ‘Module.onRuntimeInitialized’, which makes sure, that it is executed after loading, compiling and instantiation of our (big) program. It works:

Not the best solution, but it works. WebAssembly is still very brittle (at least the tooling around), so we have to live with this. A first step would be to whitelist the exported functions.

Stateful C++ Code

Calling stateless functions makes sense just in rare cases. Therefore, I want to design a simple class:

$ cat fib.h#ifndef FIB
#define FIB
class Fib {public:
  Fib();
  int next();
private:
  int curr = 1;
  int prev = 1;
};#endif$ cat fib.cpp#include "fib.h"Fib::Fib() {}int Fib::next() {
  int next = curr + prev;
  prev = curr;
  curr = next;
  return next;
}
$ cat hello.cpp#include "fib.h"
#include <iostream>
int main() {
  Fib fib{};
  std::cout << fib.next() << std::endl;
  std::cout << fib.next() << std::endl;
  std::cout << fib.next() << std::endl;
  std::cout << fib.next() << std::endl;
  std::cout << fib.next() << std::endl;
  return 0;
}
$ g++ hello.cpp fib.cpp -o fib && ./fib2
3
5
8
13

Nothing special here. Just a dump and stateful class. Let’s use it from JavaScript. I start small and do the instantiation in C++:

$ cat hello.cpp#include "fib.h"int fib() {
  static Fib fib = Fib();
  return fib.next();
}
int main() {
  fib();
  return 0;
}

Note: I’ve used the call of fib in main, that the compiler does not optimize away my fib function.

The name of my ‘fib’-binding changed, therefore my JS-code looks like:

$ cat index.html<html>
<body>
<script src="gen/hello.js"></script>
<script>
Module.onRuntimeInitialized = function() {
  console.log(Module.
__Z3fibv());
  console.log(Module.__Z3fibv());
  console.log(Module.__Z3fibv());
  console.log(Module.__Z3fibv());
}
</script>
</body>
</html>

This works out of the box:

The multiple Objects/States via Dispatching

One object (state) per public function is not enough. The next easiest way is to do dispatching:

$ cat hello.cpp#include "fib.h"
#include <vector>
auto instances = std::vector<Fib>();int next_val(int fib_instance) {
  return instances[fib_instance].next();
}
int new_fib() {
  instances.push_back(Fib());
  return instances.size() - 1;
}
int main() {
  int fib1 = new_fib();
  next_val(fib1);
  return 0;
}

The idea is simple. I took it from functional programming. ‘new_fib’ is our constructor and an integer is its ‘address.’ Maybe not the most elegant solution, but it works and easy to understand and therefore change. We have the two names for the required function:

__Z7new_fibv
__Z8
next_vali

Calling is easy:

$ cat index.html<html>
<body>
<script src="gen/hello.js"></script>
<script>
Module.onRuntimeInitialized = function() {
  let 
fib1 = Module.__Z7new_fibv();  let fib2 = Module.__Z7new_fibv();
  console.log(Module.__Z8next_vali(fib1));
  console.log(Module.__Z8next_vali(fib1));
  console.log(Module.__Z8next_vali(fib1));
  console.log(Module.__Z8next_vali(fib2));
  console.log(Module.__Z8next_vali(fib2));
  console.log(Module.__Z8next_vali(fib2));
}
</script>
</body>
</html>

Encapsulate C++: Build a Facade

It is time to abstract away the ugly C++-interface:

$ cat index.html<html>
<body>
<script src="gen/hello.js"></script>
<script>
class Fib {
  constructor() {
    this.cppInstance = Module.__Z7new_fibv();
  }
  next() {
   return Module.__Z8next_vali(this.cppInstance);
  }
}
Module.onRuntimeInitialized = function() {
  let fib1 = 
new Fib();  let fib2 = new Fib();
  console.log(fib1.next());
  console.log(fib1.next());
  console.log(fib1.next());
  console.log(fib2.next());
  console.log(fib2.next());
  console.log(fib2.next());
}
</script>
</body>
</html>

The output is still the same, but now, we have a very nice JS-interface and encapsulated the C++ -part:

More instantiation of C++ Objects

Sure, the next step could be to actually call the constructor of the ‘Fib’ class in JS. However, for me, that makes sense (now). ‘new_fib’ is also a specialized constructor optimized for JS and we are also language-agnostic. Replacing our approach with C would require no conceptual change in the instantiation.

My next step would be to replace the vector with a map and provide a delete method to get rid of no longer needed objects.

A stable interface between C++ and JavaScript

As you recognized, the name of our function changed after each refactoring, which caused our integration to fail. The wired names come from C++s name mangling .

Consistent names

To prevent this, we export their signatures as C-code.

$ cat hello.cpp#include "fib.h"
#include <vector>
extern "C" {
 int new_fib();
 int next_val(int fib_instance);
}
auto instances = std::vector<Fib>();int next_val(int fib_instance) {
 return instances[fib_instance].next();
}
int new_fib() {
 instances.push_back(Fib());
 return instances.size() - 1;
}
int main() {
 int fib1 = new_fib();
 next_val(fib1);
 return 0;
}

This is nice because I wanted to list the exported functions anyway. Now we have more consistent names (just prefixed by an underscore):

$ cat hello.cpp<html>
<body>
<script src="gen/hello.js"></script>
<script>
class Fib {
  constructor() {
    this.cppInstance = Module
._new_fib();
  }
  next() {
    return Module._next_val(this.cppInstance);
  }
}
Module.onRuntimeInitialized = function() {
  // ...
}
</script>
</body>
</html>

Export only functions that are actually used

You may recognize the size of ‘script.js’ and the big number of exported functions in the decompiled ‘wast’-file and the resulting size of the ‘Module’ in the JS-context. All this comes from the ‘EXPORT_ALL’:

$ cat build.shrm build/ -rf
mkdir build
cd build
em++ ../cpp/hello.cpp ../cpp/fib.cpp -s WASM=1 -s EXPORT_ALL=1 -o hello.js || exit 1
mv hello.js ../web/gen/
mv hello.wasm ../web/gen/

EmScripten exports all the functions of all included packages and generates bindings for them. With the consistent names, we can export only what we need.

$ cat build.shrm build/ -rf
mkdir build
cd build
em++ ../cpp/hello.cpp ../cpp/fib.cpp -s WASM=1 
-s EXPORTED_FUNCTIONS="[_new_fib, _next_val]" -o hello.js || exit 1
mv hello.js ../web/gen/
mv hello.wasm ../web/gen/

We can specify with ‘EXPORTED_FUNCTION,’ what we want to specify. The generated ‘hello.js’ is now much smaller, and we don’t leak trough internals anymore (check it out).

Integration

So far I am satisfied with the integration of C++ and JavaScript. However, it has many moving parts, like the exporting of the C-functions, the build-script, the Facade, and the usage of the facade. This complexity screams for an integration test.

Integration Test

Remember, that an integration test should not break if there is a flaw in the logic but only if the integration of two components themselves does not work anymore.

This article is no tutorial on JS-testing-frameworks. Therefore I just write vanilla JavaScript without test runner etc. Feel free to integrate the logic in the framework of your choice!

$ cat index.html<html>
<body>
<script src="gen/hello.js"></script>
<script>
class Fib {
  constructor() {
    this.cppInstance = Module._new_fib();
  }
  next() {
    return Module._next_val(this.cppInstance);
  }
}
function functionExists(f) {
  return f && typeof f === "function";
}
function isNumber(n) {
  return typeof n === "number";
}
function testFunctionBinding() {
  assert(functionExists(Module._new_fib));
  assert(functionExists(Module._next_val));
}
// int is part of the interface
function 
testNextValReturnsInt() { 
  assert(isNumber(new Fib().next()));
}
Module.onRuntimeInitialized = function() {
  testFunctionBinding();
  testNextValReturnsInt();
}
</script>
</body>
</html>

This checks if the functions are available and if next returns an integer, which is part of the interface. With this, I can easily refactor steps in the pipeline with the confidence, that I don’t break anything — for example the build system (which is still very bad).

CMake Integration

The current “build-system” is, well let’s say, not optimal:

$ cat build.shrm build/ -rf
mkdir build
cd build
em++ ../cpp/hello.cpp ../cpp/fib.cpp -s WASM=1 -s EXPORTED_FUNCTIONS="[_new_fib, _next_val]" -o hello.js || exit 1
mv hello.js ../web/gen/
mv hello.wasm ../web/gen/

Therefore I created the following ‘CMakeLists.txt’ in the cpp-directory:

$ cat cpp/CMakeLists.txtset(project "hello")project(${project})
cmake_minimum_required(VERSION 3.12)
set(src 
   hello.cpp
   fib.cpp
 )
set(exports 
   _new_fib 
   _next_val
 )
# process exported functions
set(exports_string "")
list(JOIN exports "," exports_string)
# set compiler and flags
SET(CMAKE_C_COMPILER emcc)
SET(CMAKE_CPP_COMPILER em++)
set( CMAKE_CXX_FLAGS "-s EXPORTED_FUNCTIONS=\"[${exports_string}]\""  )
# specify the project
add_executable(${project}.html ${src})

I can now modify my build script:

$ cat build.shrm build/ -rf
mkdir build
cd build
cmake ../cpp
make
mv hello.js ../web/gen/
mv hello.wasm ../web/gen/

It would be possible to pull the rest into CMake as well, but I don’t see sense in this, because these are project and platform specifics and I will likely not modify this anymore.

Resources & Tutorials

Random Thoughts and Experiences

Here is a collection of random thoughts of mine regarding WebAssembly. I will extend the collection as I get new insights. Feel free to ignore it or suggest some points.

You cannot access the DOM in WebAssembly. Therefore a natural boundary between logic (C++) and UI (JavaScript) is enforced. It also means that you have to define your modules carefully because you cannot easily refactor from one side of the boundary to the other: The languages are different.
Would it be nice, if you could compile JavaScript to WebAssembly and an engine decides, which parts are compiled? Makes it even sense?
If you do a ‘printf()’ without a “\n”, you get a warning in the chrome dev console, that the content did not get flushed and no output. Add “\n” to fix this
The flag ‘eemc … -s SIDE_MODULE=1 …’ prevents the generation of the HTML- and JS-file
The flag ‘eemc … -s ONLY_MY_CODE=1 …’ prevents the generation of the HTML- and JS-file and also compiles only the self-written code. Not even ‘stdio’ and the stdlib-stuff is compiled. This makes the resulting wasm-file way smaller.
To decompile a wasm-file you can use ‘wasm-dis hello.wasm -o hello.wast’. This brings the file into the WebAssembly Text-Format, encoded in S-expressions. You find details on the structure of the file in this great Guide on MDN.

Lessons learned

Just a collection from of the things that I recognize in while applying C++ with WebAssembly real projects.

Whenever your browser just hang and Chrome says that the website crashed it is possible some WebAssembly related. try { … } catch (Exception e) { …} helps most of the time.