51

GitHub - modernserf/zebu: A compiler for little languages in tagged template str...

 5 years ago
source link: https://github.com/modernserf/zebu
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

readme.md

Zebu

What is this?

Zebu is a JavaScript library for building little languages with tagged template literals. Here are some examples of little languages built with Zebu:

Ranges

range`1,3 ... (10)` // => yields 1, 3, 5, 7, 9

Data expressions

const obj = { foo: { bar: 3 } }
dx`.foo.bar`.replace(obj, 5) // => { foo: { bar: 5 } }

React PropTypes

const types = propTypes`
  className: string?
  type: ("select" | "datalist")?
  options: [{ id: string, label: string }]
  value: string
  onChange: func
`
/* => {
  className: PropTypes.string,
  type: PropTypes.oneOf(["select", "datalist"]),
  options: PropTypes.arrayOf([
    PropTypes.shape({
      id: PropTypes.string.isRequired,
      label: PropTypes.string.isRequired,
    }).isRequired
  ]).isRequired,
  value: PropTypes.string.isRequired,
  onChange: PropTypes.func.isRequired,
} */

Matrix math operations

const x = [
  [7, 1],
  [-2, 3],
]
matrix`
  [ 2 0 
   -1 3 ] * ${x}
`
// => [[14, 2], [-13, 8]]

Interactive fiction

const leonAdventure = story`
  === begin ===
  "You wake up, unfortunately. There is a treasure chest in the room. You do not know where it came from."
  * "open treasure chest" -> treasure
  * "pet cat" -> cat
  * "go to the bathroom" -> toilet
  === cat ===
  "The cat purrs with approval at your touch."
  * "open treasure chest" -> treasure
  * "pet cat" -> cat
  * "go to the bathroom" -> toilet
`

State machines

const traffic = machine`
  initState: #green
  states: #green | #yellow | #red
  events: #timer
  onTransition: ${(state) => console.log(state)}

  #green  @ #timer -> #yellow
  #yellow @ #timer -> #red
  #red    @ #timer -> #green
`
traffic.start() // log { type: "green" }
traffic.send({ type: "timer" }) // log { type: "yellow" }

Text matching

const joinObjects = (objects) => objects.reduce((l, r) => Object.assign(l, r), {})
const url = text`
  URL       = Protocol Host Path? "/"? Search? Anchor?
              : ${(protocol, host, path, _, search, anchor) => ({ protocol, host, path, search, anchor })}
  Protocol  = ${/[a-z]+/} "://"
  Host      = ${/[A-Za-z0-9-]+/} ++ "."
  Path      = "/" (Component ++ "/")  : ${(_, path) => path}
  Search    = "?" Pair ++ "&"         : ${(_, pairs) => joinObjects(pairs)}
  Pair      = Component "=" Component : ${(key, _, value) => ({[key]: value})}
  Anchor    = "#" Component           : ${(_, target) => target}
  Component = ${/[A-Za-z0-9()_\-~]/}  : ${decodeURIComponent}
`
url.match("https://github.com/modernserf/zebu?foo=bar20baz"/)
/* => { 
  ok: true, 
  value: {
    protocol: "https",
    host: ["github", "com"],
    path: ["modernserf", "zebu"],
    search: { foo: "bar baz" },
    anchor: null,
  },
} */

Installation & Requirements

npm install zebu

Zebu uses ES2018 features -- it works in node v10 and in modern browsers via <script type="module">, but will likely need to be transpiled with Babel for production use.

Writing a language

Zebu exports the function grammar that is used to define the rules for interpreting a language. Let's use grammar to make a yaml-like configuration language, called "spaml":

import { grammar } from "zebu"

const spaml = grammar`
  Block = Pair ** Sep           : ${fromPairs}
  Pair  = Key (":" line?) Expr  : ${(key, _, value) => [key, value]}
  Expr  = #[ Expr ** Sep ]      : ${(xs = []) => xs}
        | #{ Block }
        | value
        | "true"                : ${() => true}
        | "false"               : ${() => false}
        | "null"                : ${() => null}
  Key   = identifier | value
  Sep   = line | ","
`
function fromPairs (pairs) {
  const obj = {}
  for (const [key, value] of pairs) {
    obj[key] = value
  }
  return obj
}

You can use the spaml grammar like this:

const justin = spaml`
  name: "Justin"
  twitter_handle: "modernserf"
  hobbies: ["karaoke", "mixology", "programming"]
`

which results in:

{ 
  name: "Justin", 
  twitter_handle: "modernserf", 
  hobbies: ["karaoke", "mixology", "programming"],
}

Tokenizing

So how does this work? grammar and spaml are both functions that work with tagged template literals. First, the string components and interpolated values are transformed into tokens. This text:

  name: "Justin"
  twitter_handle: "modernserf"
  hobbies: ["karaoke", "mixology", "programming"]

is transformed into:

[
  { type: 'identifier', value: 'name' },
  { type: 'operator', value: ':' },
  { type: 'value', value: 'Justin' },
  { type: 'line' },
  { type: 'identifier', value: 'twitter_handle' },
  { type: 'operator', value: ':' },
  { type: 'value', value: 'modernserf' },
  { type: 'line' },
  { type: 'identifier', value: 'hobbies' },
  { type: 'operator', value: ':' },
  { type: 'startToken', value: '[' },
  { type: 'value', value: 'karaoke' },
  { type: 'operator', value: ',' },
  { type: 'value', value: 'mixology' },
  { type: 'operator', value: ',' },
  { type: 'value', value: 'programming' },
  { type: 'endToken', value: ']' },
]

Some text, like tabs and spaces, are removed altogether. JS-style comments (both // line comments and /* */ block) are also removed. If there are multiple newlines in a row, they are consolidated into a single line token; any opening and closing lines are removed as well.

Numbers and quoted strings become value tokens; all interpolated values are inserted as value tokens as well. Words that match JavaScript's rules for identifiers become identifier tokens. Most punctuation becomes operator tokens.

Grouping symbols ( ) [ ] { } are specially handled and become startToken and endToken, and these have a special purpose in the next stage of processing.

Skeleton syntax trees

Next, the grouping tokens are matched, and the tokens between them are collected into a single structure token:

[
  { type: 'identifier', value: 'name' },
  { type: 'operator', value: ':' },
  { type: 'value', value: 'Justin' },
  { type: 'line' },
  { type: 'identifier', value: 'twitter_handle' },
  { type: 'operator', value: ':' },
  { type: 'value', value: 'modernserf' },
  { type: 'line' },
  { type: 'identifier', value: 'hobbies' },
  { type: 'operator', value: ':' },
  { type: 'structure', structureType: '[]', value: [
    { type: 'value', value: 'karaoke' },
    { type: 'operator', value: ',' },
    { type: 'value', value: 'mixology' },
    { type: 'operator', value: ',' },
    { type: 'value', value: 'programming' },
  ] }
]

This step allows Zebu to target visibly pushdown languages -- essentially this means that Zebu grammars can be as (computationally) simple as regular expressions, but still allow recursive structures in brackets. This also makes it easier to provide good error messages & avoid some performance issues. This technique was adapted from Owl and Dylan's D-Expressions.

Parsing

In the next step, we try to match the tokens to the top rule of the grammar.

const spaml = grammar`
  Block = Pair ** Sep           : ${fromPairs}
  Pair  = Key (":" line?) Expr  : ${(key, _, value) => [key, value]}
  Expr  = #[ Expr ** Sep ]      : ${(xs = []) => xs}
        | #{ Block }
        | value
        | "true"                : ${() => true}
        | "false"               : ${() => false}
        | "null"                : ${() => null}
  Key   = identifier | value
  Sep   = line | ","
`

These parsing expressions match a single token:

  • line, value, operator, identifier - match a token of this type
  • "include", "+" - match an operator or identifier token with this value

These parsing expressions work similarly to regular expressions, and can refer to the rules defined below them:

  • expr1 expr2 - matches expr1 followed by expr2, returning the result of expr2
  • expr1 expr2 : ${func} matches expr1 followed by expr2, returning func(expr1, expr2)
  • expr1 | expr1 - try matching expr1, else match expr2
  • expr+ - match one or more expr, returning a list of expr results
  • expr* - match zero or more expr, returning a list of expr results
  • expr? - match zero or one expr, returning null or expr result

These parsing expressions are useful for matching simple operator expressions, or lists with separators, and can refer to the rules defined below them. They both return lists of expr results.

  • expr ++ separator - match one or more expr separated by separator
  • expr ** separator - match zero or more expr separated by separator, with an optional trailing separator

These parsing expressions match expressions wrapped in bracketing punctuation. Unlike the other parsing expressions, these can also refer to rules defined above them:

  • #( expr ) match expr wrapped in parentheses
  • #[ expr ] match expr wrapped in square brackets
  • #{ expr } match expr wrapped in curly braces

Combining grammars

You can include grammars in each other with include:

const commaSeparatedList = (parent) => grammar`#[ ${parent.Expr} ** "," ]`
const lang = grammar`
  Expr  = #(value)
        | include ${commaSeparatedList}
        | value
`

Operator grammars

Operator precedence is rather tedious to implement in this grammar, so zebu also includes a separate mini-language just for defining operators. Operators in the same clause have the same precedence level; clauses nearer the top of the list are higher in precedence.

const math = op`
  left  "+"   : ${(l, r) => l + r}
        "-"   : ${(l, r) => l - r}
  left  "*"   : ${(l, r) => l * r}
        "/"   : ${(l, r) => l / r}
        "%"   : ${(l, r) => l % r}
  right "**"  : ${(l, r) => l ** r}
  pre   "-"   : ${x => -x}
  post  "++"  : ${x => x + 1}
        "--"  : ${x => x - 1}
`
math`3 * 4 + 5 / 6`
// parses as (3 * 4) + (5 / 6)

You can include operator grammars into "regular" grammars:

const expr = grammar`
  Expr = include ${parent => op`
    left "++" : ${(xs, ys) => xs.concat(ys)}
    root ${parent.RootExpr}
  `}
  RootExpr  = #[ Expr ** "," ]
            | value
`
expr`["foo", "bar"] ++ ["baz"]`
// => ['foo', 'bar', 'baz']

Future work

Better Documentation. It took me a long time to understand how parsing works, and I suspect that Zebu's documentation is quite opaque to someone who is new to parsing. How can Zebu be a good introduction to parsing?

Babel macro support. Zebu is designed such that most of the parsing, both in grammars and in the languages they create, can be done at compile time. Zebu doesn't currently take advantage of this, but it would be considerably more useful if it did.

Interactive parsing workbench. Nearley and Owl both have interactive browser examples that were immensely helpful for understanding how they work. Zebu should also support online/interactive examples.

Better error messages. Parsing tools are notorious for inscrutable error messages. Can Zebu do better?


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK