GitHub - modernserf/zebu: A compiler for little languages in tagged template str...
source link: https://github.com/modernserf/zebu
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
readme.md
Zebu
What is this?
Zebu is a JavaScript library for building little languages with tagged template literals. Here are some examples of little languages built with Zebu:
Ranges
range`1,3 ... (10)` // => yields 1, 3, 5, 7, 9
Data expressions
const obj = { foo: { bar: 3 } } dx`.foo.bar`.replace(obj, 5) // => { foo: { bar: 5 } }
React PropTypes
const types = propTypes` className: string? type: ("select" | "datalist")? options: [{ id: string, label: string }] value: string onChange: func ` /* => { className: PropTypes.string, type: PropTypes.oneOf(["select", "datalist"]), options: PropTypes.arrayOf([ PropTypes.shape({ id: PropTypes.string.isRequired, label: PropTypes.string.isRequired, }).isRequired ]).isRequired, value: PropTypes.string.isRequired, onChange: PropTypes.func.isRequired, } */
Matrix math operations
const x = [ [7, 1], [-2, 3], ] matrix` [ 2 0 -1 3 ] * ${x} ` // => [[14, 2], [-13, 8]]
Interactive fiction
const leonAdventure = story` === begin === "You wake up, unfortunately. There is a treasure chest in the room. You do not know where it came from." * "open treasure chest" -> treasure * "pet cat" -> cat * "go to the bathroom" -> toilet === cat === "The cat purrs with approval at your touch." * "open treasure chest" -> treasure * "pet cat" -> cat * "go to the bathroom" -> toilet `
State machines
const traffic = machine` initState: #green states: #green | #yellow | #red events: #timer onTransition: ${(state) => console.log(state)} #green @ #timer -> #yellow #yellow @ #timer -> #red #red @ #timer -> #green ` traffic.start() // log { type: "green" } traffic.send({ type: "timer" }) // log { type: "yellow" }
Text matching
const joinObjects = (objects) => objects.reduce((l, r) => Object.assign(l, r), {}) const url = text` URL = Protocol Host Path? "/"? Search? Anchor? : ${(protocol, host, path, _, search, anchor) => ({ protocol, host, path, search, anchor })} Protocol = ${/[a-z]+/} "://" Host = ${/[A-Za-z0-9-]+/} ++ "." Path = "/" (Component ++ "/") : ${(_, path) => path} Search = "?" Pair ++ "&" : ${(_, pairs) => joinObjects(pairs)} Pair = Component "=" Component : ${(key, _, value) => ({[key]: value})} Anchor = "#" Component : ${(_, target) => target} Component = ${/[A-Za-z0-9()_\-~]/} : ${decodeURIComponent} ` url.match("https://github.com/modernserf/zebu?foo=bar20baz"/) /* => { ok: true, value: { protocol: "https", host: ["github", "com"], path: ["modernserf", "zebu"], search: { foo: "bar baz" }, anchor: null, }, } */
Installation & Requirements
npm install zebu
Zebu uses ES2018 features -- it works in node v10 and in modern browsers via <script type="module">
, but will likely need to be transpiled with Babel for production use.
Writing a language
Zebu exports the function grammar
that is used to define the rules for interpreting a language. Let's use grammar
to make a yaml-like configuration language, called "spaml":
import { grammar } from "zebu" const spaml = grammar` Block = Pair ** Sep : ${fromPairs} Pair = Key (":" line?) Expr : ${(key, _, value) => [key, value]} Expr = #[ Expr ** Sep ] : ${(xs = []) => xs} | #{ Block } | value | "true" : ${() => true} | "false" : ${() => false} | "null" : ${() => null} Key = identifier | value Sep = line | "," ` function fromPairs (pairs) { const obj = {} for (const [key, value] of pairs) { obj[key] = value } return obj }
You can use the spaml
grammar like this:
const justin = spaml` name: "Justin" twitter_handle: "modernserf" hobbies: ["karaoke", "mixology", "programming"] `
which results in:
{ name: "Justin", twitter_handle: "modernserf", hobbies: ["karaoke", "mixology", "programming"], }
Tokenizing
So how does this work? grammar
and spaml
are both functions that work with tagged template literals. First, the string components and interpolated values are transformed into tokens. This text:
name: "Justin"
twitter_handle: "modernserf"
hobbies: ["karaoke", "mixology", "programming"]
is transformed into:
[ { type: 'identifier', value: 'name' }, { type: 'operator', value: ':' }, { type: 'value', value: 'Justin' }, { type: 'line' }, { type: 'identifier', value: 'twitter_handle' }, { type: 'operator', value: ':' }, { type: 'value', value: 'modernserf' }, { type: 'line' }, { type: 'identifier', value: 'hobbies' }, { type: 'operator', value: ':' }, { type: 'startToken', value: '[' }, { type: 'value', value: 'karaoke' }, { type: 'operator', value: ',' }, { type: 'value', value: 'mixology' }, { type: 'operator', value: ',' }, { type: 'value', value: 'programming' }, { type: 'endToken', value: ']' }, ]
Some text, like tabs and spaces, are removed altogether. JS-style comments (both //
line comments and /* */
block) are also removed. If there are multiple newlines in a row, they are consolidated into a single line
token; any opening and closing lines are removed as well.
Numbers and quoted strings become value
tokens; all interpolated values are inserted as value
tokens as well. Words that match JavaScript's rules for identifiers become identifier
tokens. Most punctuation becomes operator
tokens.
Grouping symbols ( ) [ ] { }
are specially handled and become startToken
and endToken
, and these have a special purpose in the next stage of processing.
Skeleton syntax trees
Next, the grouping tokens are matched, and the tokens between them are collected into a single structure
token:
[ { type: 'identifier', value: 'name' }, { type: 'operator', value: ':' }, { type: 'value', value: 'Justin' }, { type: 'line' }, { type: 'identifier', value: 'twitter_handle' }, { type: 'operator', value: ':' }, { type: 'value', value: 'modernserf' }, { type: 'line' }, { type: 'identifier', value: 'hobbies' }, { type: 'operator', value: ':' }, { type: 'structure', structureType: '[]', value: [ { type: 'value', value: 'karaoke' }, { type: 'operator', value: ',' }, { type: 'value', value: 'mixology' }, { type: 'operator', value: ',' }, { type: 'value', value: 'programming' }, ] } ]
This step allows Zebu to target visibly pushdown languages -- essentially this means that Zebu grammars can be as (computationally) simple as regular expressions, but still allow recursive structures in brackets. This also makes it easier to provide good error messages & avoid some performance issues. This technique was adapted from Owl and Dylan's D-Expressions.
Parsing
In the next step, we try to match the tokens to the top rule of the grammar.
const spaml = grammar` Block = Pair ** Sep : ${fromPairs} Pair = Key (":" line?) Expr : ${(key, _, value) => [key, value]} Expr = #[ Expr ** Sep ] : ${(xs = []) => xs} | #{ Block } | value | "true" : ${() => true} | "false" : ${() => false} | "null" : ${() => null} Key = identifier | value Sep = line | "," `
These parsing expressions match a single token:
line
,value
,operator
,identifier
- match a token of this type"include"
,"+"
- match an operator or identifier token with this value
These parsing expressions work similarly to regular expressions, and can refer to the rules defined below them:
expr1 expr2
- matches expr1 followed byexpr2
, returning the result ofexpr2
expr1 expr2 : ${func}
matchesexpr1
followed byexpr2
, returningfunc(expr1, expr2)
expr1 | expr1
- try matchingexpr1
, else matchexpr2
expr+
- match one or more expr, returning a list ofexpr
resultsexpr*
- match zero or more expr, returning a list ofexpr
resultsexpr?
- match zero or one expr, returning null orexpr
result
These parsing expressions are useful for matching simple operator expressions, or lists with separators, and can refer to the rules defined below them. They both return lists of expr
results.
expr ++ separator
- match one or moreexpr
separated byseparator
expr ** separator
- match zero or moreexpr
separated byseparator
, with an optional trailingseparator
These parsing expressions match expressions wrapped in bracketing punctuation. Unlike the other parsing expressions, these can also refer to rules defined above them:
#( expr )
matchexpr
wrapped in parentheses#[ expr ]
matchexpr
wrapped in square brackets#{ expr }
matchexpr
wrapped in curly braces
Combining grammars
You can include grammars in each other with include
:
const commaSeparatedList = (parent) => grammar`#[ ${parent.Expr} ** "," ]` const lang = grammar` Expr = #(value) | include ${commaSeparatedList} | value `
Operator grammars
Operator precedence is rather tedious to implement in this grammar, so zebu also includes a separate mini-language just for defining operators. Operators in the same clause have the same precedence level; clauses nearer the top of the list are higher in precedence.
const math = op` left "+" : ${(l, r) => l + r} "-" : ${(l, r) => l - r} left "*" : ${(l, r) => l * r} "/" : ${(l, r) => l / r} "%" : ${(l, r) => l % r} right "**" : ${(l, r) => l ** r} pre "-" : ${x => -x} post "++" : ${x => x + 1} "--" : ${x => x - 1} ` math`3 * 4 + 5 / 6` // parses as (3 * 4) + (5 / 6)
You can include
operator grammars into "regular" grammars:
const expr = grammar` Expr = include ${parent => op` left "++" : ${(xs, ys) => xs.concat(ys)} root ${parent.RootExpr} `} RootExpr = #[ Expr ** "," ] | value ` expr`["foo", "bar"] ++ ["baz"]` // => ['foo', 'bar', 'baz']
Future work
Better Documentation. It took me a long time to understand how parsing works, and I suspect that Zebu's documentation is quite opaque to someone who is new to parsing. How can Zebu be a good introduction to parsing?
Babel macro support. Zebu is designed such that most of the parsing, both in grammars and in the languages they create, can be done at compile time. Zebu doesn't currently take advantage of this, but it would be considerably more useful if it did.
Interactive parsing workbench. Nearley and Owl both have interactive browser examples that were immensely helpful for understanding how they work. Zebu should also support online/interactive examples.
Better error messages. Parsing tools are notorious for inscrutable error messages. Can Zebu do better?
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK