

From Rust to beyond: Prelude
source link: https://www.tuicool.com/articles/hit/eIvem2i
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

At my work , I had an opportunity to start an experiment: Writing a single parser implementation in Rust for the new Gutenberg post format , bound to many platforms and environments.

This series of posts is about those bindings, and explains how to send Rust beyond earth, into many different galaxies. Rust will land in:
- The WebAssembly galaxy,
- The ASM.js galaxy,
- The C galaxy,
- The PHP galaxy, and
- The NodeJS galaxy.
The ship is currently flying into the Java galaxy, this series may continue if the ship does not crash or has enough resources to survive!
The Gutenberg post format
Let’s introduce quickly what Gutenberg is, and why a new post format. If you want an in-depth presentation, I highly recommend to read The Language of Gutenberg . Note that this is not required for the reader to understand the Gutenberg post format.
Gutenberg is the next WordPress editor. It is a little revolution on its own. The features it unlocks are very powerful.
The editor will create a new page- and post-building experience that makes writing rich posts effortless, and has “blocks” to make it easy what today might take shortcodes, custom HTML, or “mystery meat” embed discovery. — Matt Mullenweg
The format of a blog post was HTML. And it continues to be. However, another semantics layer is added through annotations. Annotations are written in comments and borrow the XML syntax, e.g.:
<code><!-- wp:ns/block-name {"attributes": "as JSON"} --> </code> <code> <p>phrase</p></code> <code><!-- /wp:ns/block-name --></code>
The Gutenberg format provides 2 constructions: Block, and Phrase. The example above contains both: There is a block wrapping a phrase. A phrase is basically anything that is not a block. Let’s describe the example:
- It starts with an annotation (
<!-- … -->
), - The
wp:
is mandatory to represent a Gutenberg block, - It is followed by a fully qualified block name, which is a pair of an optional namespace (here sets to
ns
, defaults tocore
) and a block name (here sets toblock-name
), separated by a slash, - A block has optional attributes encoded as a JSON object (see RFC 7159, Section 4, Objects ),
- Finally, a block has optional children, i.e. an heterogeneous collection of blocks or phrases. In the example above, there is one child that is the phrase
<p>phrase</p>
. And the following example below shows a block with no child:
<!-- wp:ns/block-name {"attributes": "as JSON"} /-->
The complete grammar can be found in the parser’s documentation .
Finally, the parser is used on the editor side, not on the rendering side. Once rendered, the blog post is a regular HTML file. Some blocks are dynamics though, but this is another topic.

The grammar is relatively small. The challenges are however to be as much performant and memory efficient as possible on many platforms. Some posts can reach megabytes, and we don’t want the parser to be the bottleneck. Even if it is used when creating the post state (cf. the schema above), we have measured several seconds to load some posts. Time during which the user is blocked, and waits, or see an error. In other scenarii, we have hit memory limit of the language’s virtual machines.
Hence this experimental project! The current parsers are written in JavaScript (with PEG.js ) and in PHP (with phpegjs
). This Rust project proposes a parser written in Rust, that can run in the JavaScript and in the PHP virtual machines, and on many other platforms. Let’s try to be very performant and memory efficient!
Why Rust?
That’s an excellent question! Thanks for asking. I can summarize my choice with a bullet list:
- It is fast, and we need speed,
- It is memory safe, and also memory efficient,
- No garbage collector, which simplifies memory management across environments,
- It can expose a C API ( with Foreign Function Interface, FFI ), which eases the integration into multiple environments,
- It compiles to many targets ,
- Because I love it.
One of the goal of the experimentation is to maintain a single implementation (maybe the future reference implementation) with multiple bindings.
The parser
The parser is written in Rust. It relies on the fabulous nom library .

The source code is available in the src/
directory in the repository . It is very small and fun to read.
The parser produces an Abstract Syntax Tree (AST) of the grammar, where nodes of the tree are defined as:
<code>pub enum Node<'a> {</code> <code> Block {</code><code> name: (Input<'a>, Input<'a>),</code> <code> attributes: Option<Input<'a>>,</code> <code> children: Vec<Node<'a>></code> <code> }, </code> <code> Phrase(Input<'a>)</code> <code>}</code>
That’s all! We find again the block name, the attributes and the children, and the phrase. Block children are defined as a collection of node, this is recursive. Input<'a>
is defined as &'a [u8]
, i.e. a slice of bytes.
The main parser entry is the root
function . It represents the axiom of the grammar, and is defined as:
pub fn root(input: Input
) -> Result<(Input, Vec<ast::Node>), nom::Err<Input>>;
So the parser returns a collection of nodes in the best case. Here is an simple example:
use gutenberg_post_parser::{root, ast::Node}; let input = &b"<!-- wp:foo {\"bar\": true} /-->"[..]; let output = Ok( ( // The remaining data. &b""[..], // The Abstract Syntax Tree. vec![ Node::Block { name: (&b"core"[..], &b"foo"[..]), attributes: Some(&b"{\"bar\": true}"[..]), children: vec![] } ] ) ); assert_eq!(root(input), output);
The root
function and the AST will be the items we are going to use and manipulate in the bindings. The internal items of the parser will stay private.
Bindings
From now, our goal is to expose the root
function and the Node
enum in different platforms or environments. Ready?
3… 2… 1… lift-off!
Recommend
-
131
Emacs Prelude Prelude is an Emacs distribution that aims to enhance the default Emacs experience. Prelude alters a lot of the default settings, bundles a plethora of additional packages and adds its own core library to the mix. The final...
-
26
README.md Bach's prelude in C major from WTC book I, written in sed To listen to it on Linux, run: $ echo | ./bach.sed | aplay -r44100
-
15
Prelude to a Profession 27 November 2015 In my previous blog, The Programmer’s Oath, I introduced the concept of an ethic...
-
14
Building a Better Custom Haskell Prelude The Haskell Prelude is the default import into all Haskell modules, it provides an endless number of ways to shoot ourselves in the foot and historical cruft that can’t be removed. While it is...
-
8
Conversation Copy link Contributor
-
9
Conversation Copy link Contributor ...
-
9
Copy link Contributor jam1garner...
-
8
New issue Use right span in prelude collision suggestions with macros. #88501
-
16
Copy link Contributor Author
-
3
Additions to the prelude Summary
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK