4

Microvium ModulesDesign thoughts

 3 years ago
source link: https://coder-mike.com/2020/05/microvium-modules/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Microvium ModulesDesign thoughts

Microvium ModulesDesign thoughts

Today, I completed the first working version of the module system for Microvium, and I want to share some of the design considerations and the conclusions I came to. The design as it stands is the result of quite a few iterations. It’s is likely to change again in the future as we iterate on Microvium before its first proper release. Any feedback and suggestions are welcome.

This will be a long and technical post. Most of my readers will probably want to skip this one. There’s also a more succinct overview of the module API available here:

https://github.com/coder-mike/microvium/blob/master/doc/js-host/module-api.md

There is some overlap between what Microvium achieves and what SES Compartments achieve, in the sense that both create isolated environments that allow JavaScript code1 running inside the respective environments to gain access to other modules through configurable mechanisms (e.g. hooks) specified from outside the environment (e.g. in the host). So naturally, some of the inspiration for the Microvium API has come from the Compartment API. But I’ve also diverged somewhat from some of the decisions made for the Compartment API to date and made different trade-offs, and I wanted to informally document my thought process.

Note that modules are only a consideration when running Microvium on a desktop host, for example in node.js, so the API I’m referring to here is the JavaScript host Microvium API. On the microcontroller host, there is neither the ability to import modules nor parse source text, and interaction with the firmware host is not done through modules for efficiency reasons.

Design Objectives

I want the design of the Microvium API to be clean and simple to understand and use. Bear in mind that, at least initially, the target users for microvum are probably firmware developers who are looking at potentially adopting Microvium as a new and foreign third-party tool. This is different from the target users for Compartments, who are likely to already be entrenched in the JavaScript ecosystem, and when confronted with SES and Compartments will be learning about a standard feature of JavaScript rather than a foreign, third-party tool.

Simplity, simplicity, simplicity!

Modules are a complicated topic when you start digging into the details of circular dependencies, caching, relative specifiers, resolvers, etc. So it’s been quite a challenge to create an API that abstracts all of that while still remaining true to the specification!

As part of keeping things simple, I want the API to necessitate the understanding of as few new concepts to the user as possible when explaining microvium (e.g. in the documentation). As I go through the design in the rest of this post, I’ll highlight some of the areas where I’ve removed the need for the user to understand various concepts by removing those concepts from the API design completely. My favorite kind of design work is all about removing and simplifying.

Some other objectives:

  • To remain independent of the file system or any particular module specifier scheme
  • To provide complete and perfect isolation and sandboxing of the app running inside the Microvium virtual machine.
  • To have interoperability between node modules and Microvium modules, where desired
  • To remain independent of the particular caching (memoization) policy used for modules, while remaining correct according to ECMAScript specification requirements for module caching.

The best way to lead you through the design is by example. I’ll start with the simplest examples and then build them up in complexity, explaining the thinking and design choices as I go.

Examples

The following are working2 examples, but bear in mind that the Microvium implementation is, in general, probably still less than half complete, so don’t expect to generalize too much.

Example: No Modules (One Module)

I’ll start with an example without any imports or exports. For a user that doesn’t need support for multiple modules, the module system should not add any cognitive load.

A simple hello-world in Microvium is as follows:

import Microvium from 'microvium';
// Let's say we have this source text
const sourceText = `print('Hello, World!')`;
// Create a new virtual machine
const vm = Microvium.create();
// The virtual machine is going to need a "print" function
vm.globalThis.print = console.log;
// Import the source text into the virtual machine
vm.evaluateModule({ sourceText });
import Microvium from 'microvium';

// Let's say we have this source text
const sourceText = `print('Hello, World!')`;

// Create a new virtual machine
const vm = Microvium.create();

// The virtual machine is going to need a "print" function
vm.globalThis.print = console.log;

// Import the source text into the virtual machine
vm.evaluateModule({ sourceText });

(Be aware that the above code is evaluated prior to running on the target device)

There are already a couple of interesting things to note here:

  • Globals are provided by mutating globalThis. Internally this is implemented as a proxy whose properties represent the global variables of the virtual machine. For users who don’t care about dividing code into modules, providing things through globals is quite powerful. I’ve kept the name consistent with the Compartment API.
  • In contrast with the SES Compartment API, the Microvium create function does not accept a set of endowments. This is for the practical reason that many globals in Microvium are likely to refer to objects created inside the VM itself (otherwise they wouldn’t appear in the snapshot), and so can’t be created until the VM exists to create them in.
  • The SES Compartment API has an evaluate function which could be used to evaluate JavaScript “script” text (as opposed to module text). In Microvium, for simplicity, I decided that all source text should be interpreted using module semantics, so there is no evaluate function. This eliminates one point of confusion for Microvium users since there is no need to understand “script” vs “module” (limiting the number of new concepts for non-JS users to absorb). This is only possible because Microvium is a completely new tool, and the language it supports is already a subset of full ECMAScript.
  • Observant readers will notice that Microvium.evaluateModule() actually accepts an object and not a string of source code. The reason for this will become more clear later, but it doesn’t add a significant amount of cognitive overhead in simple cases like this.

I’m not sure what the evaluateModule method should be named — I may change this in future, and I’ve changed it about 10 times since I started doing the design. It accepts source text and returns a module object (i.e. module namespace object) representing the exports of the module source text. This is in contrast to any of the Compartment methods which only accept module specifiers3.

There are multiple reasons I chose not to expose a method that receives a module specifier. I’ll elaborate as we go through more examples, but here are two reasons:

Firstly, it would mean that the user would need to provide multiple implementation points, just to get a single module running in the VM. The Compartment API requires that the user provide an importHook and a resolveHook to the compartment constructor prior to being able to process imports, so that the internal Compartment machinery knows what to do with the specifier. This may be more acceptable in the context of Compartments, since the evaluate function at least provides a starting point for new users to get familiar with Compartments without needing to understand all the concepts like “what is a hook?”, “what is resolving?”, “why is resolving different to importing?” etc. As I say, in Microvium, I wanted to introduce as few new concepts as possible, while still retaining the same generality if possible.

The signature for the evaluateModule method is as follows:

interface Microvium {
* Imports the given source text as a module.
* Returns the module namespace object for the imported
* module: an objects whose properties are the exports of
* the module.
evaluateModule(moduleSource: ModuleSource): ModuleObject;
// ...
interface Microvium {
  /**
   * Imports the given source text as a module.
   *
   * Returns the module namespace object for the imported 
   * module: an objects whose properties are the exports of 
   * the module.
   */
  evaluateModule(moduleSource: ModuleSource): ModuleObject;

  // ...
}

(I’ve used the name ModuleObject because it is an object representing the module. The Node.js documentation just calls the analogous thing “the module” — the thing returned by require — but I wanted to disambiguate the instantiated module from the source text of the module, both of which are in some respects “modules”).

I find this aspect of the design reasonably elegant — it accepts source code as input, and gives you back a module as output. All the intermediate steps are abstracted away, which I think is beautiful. The only concepts that the user needs to understand here are the existence of the source code and that of the resulting module, which are both well-understood concepts. In particular, I’ll note that this example (so far) has no dependence on the following concepts:

  • Hooks
  • Specifiers
  • Resolvers, loaders, etc.
  • Files and file systems

It’s important to me that this is the case, since the demonstration of Microvium with a “Hello, World” example should not overwhelm the user with new concepts.

The name evaluateModule is similar in principle to the Compartment API’s evaluate method, but works on module source text instead of script source text.

Example: Single Export

It follows naturally, and probably without need for explanation, that a module can have exports which are accessible to the host:

const sourceText = `export const x = 5`;
const { x } = vm.evaluateModule({ sourceText });
console.log(x); // 5
const sourceText = `export const x = 5`;
const { x } = vm.evaluateModule({ sourceText });
console.log(x); // 5

Again, this is all very natural, and we haven’t yet needed to add any new concepts into the user’s mental model.

One point I’ll touch on briefly is the fact that these kinds of direct object references between the host and the virtual machine are severed in a snapshot — a native C host has no way of accessing the module object of a module in a VM restored from a snapshot. Exports to a C host, or any host that resumes a VM from a snapshot (e.g. in node.js using Microvium.restore()), are done through a separate mechanism (see the now-slightly-outdated getting-started guide for examples of native exports).

I did attempt at one stage to consolidate the ES module system with the Microvium native FFI system, but came to the conclusion that it wasn’t reasonable to do, since it would require a lot of complexity and overhead with C hosts.

Example: Single Import

Now let’s say that we have a module that imports another module:

const sourceText = `import x from 'y'`;
// The following will throw "Module not found: 'y'"
vm.evaluateModule({ sourceText });
const sourceText = `import x from 'y'`;
// The following will throw "Module not found: 'y'"
vm.evaluateModule({ sourceText });

The above value for sourceText is a string of Microvium code which attempts to import the default export from the module identified by the module specifier 'y' and binds it to the name x. It doesn’t do anything with x, but that’s fine for illustration for the moment.

So far in this example, we haven’t yet provided any mechanism for the VM to actually perform the import, so it will throw to say that it can’t find the module4. Remember that Microvium is implemented independently from the file system, or any particular method of understanding module specifiers — they could refer to files, web URLS, or database IDs.

So, how do we provide the module 'y' to the VM so that the script5 can import it?

A challenge with modules is that the specifiers in general can be relative to the importing module. Consider in node.js that a module could import '../m', which is importing the file named m in the directory that is the parent of the directory containing the module performing the import. So, we can’t just give the VM a global resolution that says that the specifier 'y' maps to some particular module, since it may map to different modules depending on the context and the specifier scheme being used. We need to do it in a way that allows the specifier 'y' to change meaning depending on which module in the VM is doing the import.

This is another point where SES compartments and Microvium diverge. With compartments, the Compartment constructor accepts a resolveHook as one of its options. The resolve hook interprets the meaning of a specifier. For example, it may search the file system to find the appropriate module source text (but not load it, yet).

The resolve hook for a compartment is a function which accepts a module specifier and a referrer string and returns a string that absolutely identifies the module6. The referrer string identifies the module doing the importing so that the resolver can produce different results depending on which module is performing the import.

For Microvium, I think it’s unnecessary complexity to have the idea of an intermediate “thing” for users of the API to understand, which is that of the resolved/full module identifier (and also the “referrer”). It’s not easy to explain what this is even. In the case of node.js, it’s probably a normalized, absolute file path to the module source text file. In the case of the web, it might be a URL that identifies where the script should be downloaded from. I’d prefer not to have to explain this in the Microvium docs, so I’ve deleted the concept from the API design.

Example: importDependency

The solution I came up with in Microvium is to pass a callback into the evaluateModule method itself. Let’s look at another example, here with code that doesn’t throw:

const sourceText = `
import { x } from 'y';
print(x); // 10
const moduleY = { x : 10 };
vm.evaluateModule({
sourceText,
importDependency: specifier => moduleY
const sourceText = `
  import { x } from 'y';
  print(x); // 10
`;
const moduleY = { x : 10 };

vm.evaluateModule({ 
  sourceText,
  importDependency: specifier => moduleY 
});

I’ll start with the disclaimer that the techniques used in these examples would likely not be used in production software. We are still in the realm of “demonstrating the public API of the virtual machine so that it can be understood“, and not “demonstrating the typical/recommended way to use the API“.

In our example, of course, we’re ignoring the specifier and always returning the module moduleY. This would only be correct behavior if you intended all module specifiers to alias the module y, but hopefully it’s obvious how the example could be extended to use the specifier to discriminate between multiple possible module objects, possibly by looking in a table of available modules, or by iterating the file system (more on that in a moment).

To understand this example, we need to introduce two new concepts to the user7:

  1. Module specifier: This is not a “new” concept for JavaScript programmers, but might be for C programmers, since it is a concept distinct from “file path”. However, it’s easy enough to explain that the module specifier is “the string text that appears after from in an import statement”.
  2. importDependency

I’m quite conflicted about the name “importDependency” here — any suggestions are welcome as to alternative terms to use. The importDependency callback8 is a function that takes a module specifier and returns the corresponding module object, in the context of the importing module.

This feels reasonably elegant and intuitive to me. Whenever we import a module, we are also telling the VM how to import nested dependencies relative to that module. It’s completely abstracted from the location of those nested dependencies, the mapping of module specifiers to the corresponding modules, and the mechanism by which they are retrieved and cached (more on caching later).

I’ll note here that the module object returned in the example by the importDependency callback is actually an object residing in the host, not the VM. The VM will interact with the host object through something like a proxy. This kind of module is analogous to the node.js core modules, which the node documentation says are “compiled into the binary” — i.e. they are baked into the host rather than being implemented as part of the application, but appear to the application as local objects. (Tying the analogy to our example, the host is the node.js code and the application is the Microvium script). It would be fine for the importDependency callback to return a VM object as well (of which a VM module is a special case), and the membrane would handle it the natural way (more on this later).

It might be worth highlighting at this point that the module that importDependency returns is just a normal/ordinary object. In the ES spec, module namespace objects are exotic objects with special internal properties that are inaccessible to user code. I don’t know if I’m violating the spec by allowing the reified module namespace object to be any object, but it certainly makes the API a lot easier to understand and use.

I want to emphasize that, so far, we’ve been able to demonstrate the module system of Microvium without requiring multiple files. Each example has just been a few lines of JavaScript code. I personally think that this is a huge plus when it comes to attracting newcomers to a new tool — being able to demonstrate something as complex as multi-module code without actually requiring multiple files in the examples. This is particularly useful for explaining this API with snippets like is done in this post. Of course, it naturally raises the question in the user’s mind of what to do with actual Microvium source code files on the user’s hard drive, but I like the fact that these concepts can be introduced independently of each other. A user familiar with node.js could quite easily see how to read a source file from disk instead of using a literal string if they so desired (although there are better ways, which I’ll introduce in a moment).

The benefits go beyond communication. Being able to treat arbitrary objects as modules makes it easy to programmatically synthesize modules on demand. An example of this would be an attenuated version of an existing module. Depending on whether the attenuation should exist inside the VM or outside, the attenuator could either produce a new module object or code-generated Microvium source code (which is in turn used to get a new module object).

I really like the fact that the relative nature of specifiers is captured by the callback being per-module. This eliminates the need for the concept of some kind of “full specifier”, e.g. an absolute file path or URL, or something like that. Of course, in most production projects, there may indeed be full specifiers existing under the hood somewhere, but I like the fact that the surface area of the VM API is independent of what this specifier looks like, if it exists at all. Full specifiers need not exist, or, if they exist, they could exist in any form, such as using a composite object as an absolute specifier (e.g. a tuple composing a database name with a record ID, if the module code is stored in a database). This is not to say that any particular method is advised, but rather to say that the VM is abstracted and decoupled from the choice.

Example: Importing Source Text

Importing source text instead of a host module introduces nothing new. It just involves putting together the machinery we’ve already covered so far in the post:

const moduleACode = `
import { b } from 'moduleB';
print(b); // 10
const moduleBCode = `export const b = 10`;
// Import moduleACode, which will import module B
vm.evaluateModule({
sourceText: moduleACode,
importDependency: specifier => {
if (specifier === 'moduleB') {
return vm.evaluateModule({
sourceText: moduleBCode
const moduleACode = `
  import { b } from 'moduleB';
  print(b); // 10
`;
const moduleBCode = `export const b = 10`;
// Import moduleACode, which will import module B
vm.evaluateModule({ 
  sourceText: moduleACode,
  importDependency: specifier => {
    if (specifier === 'moduleB') {
      return vm.evaluateModule({
        sourceText: moduleBCode
      })
  }
});

Nothing new here, but the nesting is starting to look ugly. Let’s refactor to clean up the nesting, and we can up the game a bit while we’re at it by adding a circular dependency…

Circular Dependencies and Module Caching

In the following code, I’ve created a modules “table” with two modules — moduleA and moduleB, and an importDependency function which is shared by both the modules and the bootstrap code. So, importDependency here is treated as a universal way of importing modules, with no support for relative paths (after all, we can make up whatever resolution rules we want).

const importDependency = specifier =>
modules[specifier] && vm.evaluateModule(modules[specifier]);
const modules = {
'moduleA': {
sourceText: `
import { b, printA } from 'moduleB';
export const a = 5;
print(b); // 10
printA(); // 5
importDependency
'moduleB': {
sourceText: `
import { a } from 'moduleA';
export const b = 10;
export const printA = () => print(a);
importDependency
// Import moduleA which will import moduleB
importDependency('moduleA');
const importDependency = specifier => 
  modules[specifier] && vm.evaluateModule(modules[specifier]);

const modules = {
  'moduleA': {
    sourceText: `
      import { b, printA } from 'moduleB';
      export const a = 5;
      print(b); // 10
      printA(); // 5
    `,
    importDependency
  },
  'moduleB': {
    sourceText: `
      import { a } from 'moduleA';
      export const b = 10;
      export const printA = () => print(a); 
    `,
    importDependency
  },
};

// Import moduleA which will import moduleB
importDependency('moduleA');

You may be surprised to see that this doesn’t result in an infinitely recursive cycle of loading modules. This leads me to the point of this section: module caching.

Node.js talks about this concept of caching modules. The term “caching” may be better described as “memoization”, since it is not an optional cache for performance reasons but rather something that is required in order to implement the described semantics: multiple calls to require will not cause the module code to be executed multiple times, and the same module object will be returned each time9.

Node.js caches the modules based on the full file path, but the ECMAScript standard doesn’t require caching to the same level. The spec says (emphasis added):

Each time is called with a specific referencingScriptOrModulespecifier pair as arguments it must return the same Module Record instance if it completes normally.

Multiple different referencingScriptOrModulespecifier pairs may map to the same Module Record instance. The actual mapping semantic is implementation-defined but typically a normalization process is applied to specifier as part of the mapping process. A typical normalization process would include actions such as alphabetic case folding and expansion of relative and abbreviated path specifiers.

https://tc39.es/ecma262/#sec-hostresolveimportedmodule

To translate this, my understanding is that multiple import statements within a single importing module must resolve to the same module, and multiple imports across the same or different modules are permitted to resolve to the same module, using some host-defined normalization process such as resolving absolute file paths.

Microvium implements the ECMAScript requirement by essentially consolidating multiple imports of the same specifier within a module into a single physical import entry. This does not require a cache at all.

Microvium facilitates node-style module caching (e.g. caching based on file paths) by providing the guarantee that if the same exact ModuleSource object (by reference identity) is passed to evaluateModule, then the same module object will be returned.

This doesn’t feel as elegant to me as the rest of the API, but it does make some level of sense:

  • The evaluateModule method takes a ModuleSource object (the one with the sourceText property) and returns a ModuleObject. Iff the caller passes a distinct ModuleSource, they will receive a distinct ModuleObject. Put another way, iff they instantiate a new ModuleSource, they will get a corresponding new instantiation of the ModuleObject (by having the module source code executed again).
  • This solution is still completely decoupled from the representation that the user may choose for full specifiers, if they choose to use full specifiers at all.
  • The user code is free to control the caching however they choose, by simply caching the corresponding ModuleSource objects according to their caching scheme. Later, I’ll show an example with a node-style importer and its corresponding caching system.
  • A special case of the above point is that the user is able to have multiple module instantiations with the same “full specifier” (e.g. multiple instantiations of a module with the same file path), should they choose to do so. This was not a goal of the design, but is a side effect of the fact that the API is decoupled from the notion of “full specifier”.

nodeStyleImporter

Writing the dependency importer by hand is fine for advanced users and library writers, and is great for the above examples that explain the Microvium API, but most Microvium users probably just want to know how to use Microvium scripts the same way they would use node.js scripts, and for that, there’s the nodeStyleImporter, which is a function that creates an initial import hook given some options and a Microvium VM to import into:

function nodeStyleImporter(vm, options): ImportHook;
function nodeStyleImporter(vm, options): ImportHook;

An “import hook” in this sense is just a mapping from specifier strings to module objects, something like the require function in node.js:

type ImportHook = specifier => ModuleObject | undefined;
type ImportHook = specifier => ModuleObject | undefined;

Here’s an example usage:

const moduleOptions: ModuleOptions = {
// Allow the importer to access the file system, but only for
// subdirectories of the specified project directory.
accessFromFileSystem: 'subdir-only',
// Specify the root directory of the project, from which
// initial imports will be resolved
basedir: 'my/project/directory',
// A set of "core" modules: those which can be imported from
// any Microvium module with the exact same absolute specifier
coreModules: {
// Example of core module implemented as VM source text
'a-core-module': './a-core-module.mvms',
// Example of core module implemented by the host
'another-core-module': require('a-module-in-the-host')
// Allow Microvium modules to import `fs`, `http`, etc.
allowNodeCoreModules: true,
const vm = Microvium.create();
const importer = nodeStyleImporter(vm, moduleOptions);
importer('./the-entry-module');
const moduleOptions: ModuleOptions = {
  // Allow the importer to access the file system, but only for 
  // subdirectories of the specified project directory.
  accessFromFileSystem: 'subdir-only',

  // Specify the root directory of the project, from which
  // initial imports will be resolved
  basedir: 'my/project/directory',

  // A set of "core" modules: those which can be imported from
  // any Microvium module with the exact same absolute specifier
  coreModules: {
    // Example of core module implemented as VM source text
    'a-core-module': './a-core-module.mvms', 
    // Example of core module implemented by the host
    'another-core-module': require('a-module-in-the-host') 
  },

  // Allow Microvium modules to import `fs`, `http`, etc.
  allowNodeCoreModules: true,
};

const vm = Microvium.create();
const importer = nodeStyleImporter(vm, moduleOptions);
importer('./the-entry-module');

This example delegates the complex work of resolving, importing, and memoization policy to the nodeStyleImporter, which is an importer that uses the resolve npm package alongside node’s fs module to find and read the source files for execution.

The full example can be seen as a test case in the github repo. I won’t explain it here because at this point we’re no longer talking about the Microvium API and we’re talking about an optional utility that helps in a specific scenario. I show it here as a demonstration of what can be done using the described Microvium API.

The nodeStyleImporter implements the importing of modules from the file system, as well as allowing Microvium scripts to transparently import node modules by indirectly invoking require, giving the Microvium app access to npm modules, among other things.

Conclusions

I’m pretty pleased with this design. It’s still early in Microvium development, so I’m sure the API will change again before the release, but I’m pleased by where it is at the moment.

I think it achieves all the goals, of being simple, easy to use, minimize the number of unnecessary concepts, and being compliant with the ES spec. It facilitates the handling of circular dependencies and it isn’t coupled to the file system. It doesn’t dictate a particular caching/memoization scheme. It can deal transparently with modules loaded inside the VM as Microvium source text alongside those provided and implemented in the host.


  1. In the case Microvium, only a subset of JavaScript is supported 

  2. The examples hopefully work, but I haven’t actually tested them verbatim. 

  3. A module specifier is the string that comes after from in an import ... from ... statement 

  4. For the purposes of isolation, the error message is the same whether the module is unavailable or whether no import hook is provided at all, since the script should not be able to distinguish these scenarios 

  5. I use the word “script” generically in this post to refer to source text written in the Microvium language 

  6. I don’t think the SES spec yet specifies whether these identifiers are strings — I’ll be interested to see if this is specified 

  7. As you know, I’m very aware of how many concepts need to be introduced and how quickly we need to introduce them 

  8. I’m calling this a callback and not a hook, because it’s not hooking into existing behavior but rather implementing the required behavior 

  9. This is generally true, but there are some caveats that I won’t discuss here 


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK