

What's in a Parser Combinator? (2016)
source link: https://remusao.github.io/posts/whats-in-a-parser-combinator.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

As part of my ongoing effort to make progress in Haskell (that’s one of my goals for 2016!), I’m following the MOOC on functionnal programming by Erik Meijer on edX .
The first lessons were pretty basic stuff, and I got through them quickly. Lesson 7 is about Functional parsers and M***** (scary). This is where I encountered my first difficulties, and I thought it would be an interesting writing. I already used parser combinators in Haskell before (mainly Parsec and Attoparsec ), but never really understood how they worked, or at least not enough to implement one myself. So here is my take on the subject. Don’t expect really advanced stuff! It’s just an introduction to the basic concepts, on which we could build more complex and useful tools. In particular, I won’t talk about :
- How to report errors.
- How to recover from errors.
- How to write a parser for a concrete grammar.
Instead I’ll focus on :
- What a parser is .
- How to make parsers compose .
- How to use do notation to implement more complex parsers.
One of the interesting facts about writing your own parser combinators library, is that you will learn (or consolidate) other knowledges in the process, like: Functors , Applicatives and, of course, Monads , and more generaly, how to design DSL in Haskell . I already knew about this concepts (at least, that’s what I thought…), but knowing what something is from a high level of abstraction, is not the same as knowing how to implement it on a concrete type (like a Parser)!
So what’s a parser?
We can view a Parser
as something
that consumes some input, and outputs a structured representation of what was consumed. For the sake of simplicity, we’ll only consume strings (Haskell type String
). So that would be something like:
type Parser a = String -> a
Here a
represents the type of what is built
from the stream of characters ( String
). This could be a syntactic tree, or a list of numbers, or anything else. For example a parser that is able to recognize a string like "[1, 2, 3, 4]"
could have the type: Parser [Int]
(expended to String -> [Int]
), which means it takes a String
and output a list
of integers.
But we’re missing two important properties of a Parser :
- It can fail to parse something.
- It can partially consume its input.
To take into account the first point, we could return Maybe a
instead of a
(resulting in Nothing
in case of failure). Note that we could also use a richer type like Either
to handle parsing errors. And for the second point, we can return a tuple of a a
and a String
, which represents the part of the string that wasn’t consumed by the parser. The type would then become:
data Parser a = Parser { runParser :: String -> Maybe (a, String) }
As an example of a parser that would fail, if you take our previous parser
that is able to handle a list of integers, if you give it the string "[1 ,2"
, it will fail, and return Nothing
.
Similarly, if we feed the parser
with "[1, 2, 3, 4]toto"
, it will consume the part of the string that represents the list of integers, and leave "toto"
as a remaining input. Thus the result would be: Just ([1, 2, 3, 4], "toto")
.
Let’s implement some very basic parsers:
-- This parser always fails failure :: Parser a failure = Parser $ \s -> Nothing
-- This parser always succeeds and returns the value given as input -- (leaving the input string intact) return :: a -> Parser a return a = Parser $ \s -> Just (a, s)
-- This parser returns the first char of the input string, and -- fail on empty input oneChar :: Parser Char oneChar = Parser $ \s -> case s of [] -> Nothing (c:xs) -> Just (c, xs)
Let’s test these parsers:
runParser failure "Hello Parser!"
Nothing
runParser (return 42) "Hello Parser!"
Just (42, "Hello Parser!")
runParser oneChar "Hello Parser!"
Just ('H',"ello Parser!")
runParser oneChar ""
Nothing
The basic parsers seem to behave as expected. We get Nothing
in case of failure, and they are able to partially consume the input. So all is good, but what about more complex parsers? We would like to parse strings, or more complex patterns. Let’s try to recognize a string from the input, using our basic parsers:
string :: String -> Parser String string "" = return "" string (c1:xs1) = Parser $ \s -> case runParser oneChar s of Nothing -> Nothing Just (c2, rest) -> if c1 == c2 then case runParser (string xs1) rest of Nothing -> Nothing Just (match, rest2) -> Just (c2:match, rest2) else Nothing
runParser (string "Hello") "Hello Parser!"
Just ("Hello", " Parser!")
runParser (string "Hello") "Foo Bar"
Nothing
runParser (string "") "Hello Parser!"
Just ("","Hello Parser!")
This isn’t very convenient (but it works)… Because we have to write the boilerplate to compose parsers
over and over. Hopefully, we know a famous structure that allows composition in Haskell, and this is called Monad
(and I won’t make yet another tutorial on Monads
, so I will assume you already are familiar with this concept). That means we could avoid all the boilerplate, by making our Parser
type an instance of Monad
. This would allow us to use the do syntax
to cleanly compose our parsers! Sweet!
To do so, we’ll have to make our Parser an instance of: Functor , Applicative and Monad .
Parser is a Functor
First of all, our Parser is an instance of
Functor
, which means we can map
functions over the result of our parsing:
instance Functor Parser where -- fmap :: (a -> b) -> Parser a -> Parser b -- 1. Run parser on input string. -- 2. Apply function on result of parsing. fmap f p = Parser $ \s -> case runParser p s of Nothing -> Nothing Just (a, rest) -> Just (f a, rest)
-- Parse `String` "42" and then convert it to `Int` using `read` parse42 :: Parser Int parse42 = (fmap read $ string "42") runParser parse42 "42 is the answer!"
Just (42, " is the answer!")
Parser is an Applicative
Secondly, we can make our parser an instance of
Applicative
. This part wasn’t obvious for me. All the examples I found were about instances for easy types like Maybe
, but I found a Parser
to be pretty different. But thanks to the types and some use-cases (that you’ll find below), I figured the following implementation (which will hopefully be correct…):
instance Applicative Parser where -- pure :: a -> Parser a -- Wrap a value inside a parser, leaving input unchanged. pure a = Parser $ \s -> Just (a, s) -- (<*>) :: Parser (a -> b) -> Parser a -> Parser b -- 1. Run first parser on input (resulting in a function (a -> b). -- 2. Run second parser on remaining input, left by first parser. -- 3. Apply function (a -> b) on result of second parser. p1 <*> p2 = Parser $ \s -> case runParser p1 s of Nothing -> Nothing Just (f, rest) -> case runParser p2 rest of Nothing -> Nothing Just (a, rest2) -> Just (f a, rest2)
The usefulness of the previous instance might not be obvious, but it allows us to lift
some function inside the realm of parsers. For example if we want to take the result of several parsers and then group their results into a tuple, we can do it using Applicatives
:
parseTuple :: Parser (Char, Char) parseTuple = (,) <$> oneChar <*> oneChar runParser parseTuple "ab"
Just (('a', 'b'), "")
This is the kind of constructs we will use to convert the raw parsed structure into our own types (e.g: an AST).
data AST = Foo String | Bar String | Pair Char Char deriving (Show) parseFoo, parseBar, parsePair :: Parser AST parseFoo = Foo <$> string "foo" parseBar = Bar <$> string "bar" parsePair = Pair <$> oneChar <*> oneChar
runParser parseFoo "foo bar" runParser parseBar "bar baz" runParser parsePair "xyz"
Just (Foo "foo", " bar")
Just(Bar "bar", " baz")
Just (Pair 'x' 'y', "z")
Parser is a Monad
Last but not least, our parser is a
Monad
. Which means it must implement: >>=
, >>
, return
and fail
:
instance Monad Parser where -- (>>=) :: Parser a -> (a -> Parser b) -> Parser b -- 1. Run first parser on input. -- 2. Feed result of parsing to `f`. -- 3. Run second parser (result of `f`) on remaining -- input (left by first parser) p >>= f = Parser $ \s -> case runParser p s of Nothing -> Nothing Just (a, rest) -> runParser (f a) rest -- (>>) :: Parser a -> Parser b -> Parser b -- 1. Run first parser on input. -- 2. Run second parser on remaining input (left by first parser) -- We ignore result of first parser. p1 >> p2 = Parser $ \s -> case runParser p1 s of Nothing -> Nothing Just (_, rest) -> runParser p2 rest -- return :: a -> Parser a return = pure -- fail :: String -> Parser a fail _ = Parser (const Nothing)
Thanks to this definition we can use the do
syntactic sugar, which will ease the implementation of more complex parsers. Let’s see what we can do.
-- Parse a specific `Char` from the input char :: Char -> Parser Char char c = do c1 <- oneChar if c == c1 then return c1 else failure
runParser (char 'H') "Hello!" runParser (char 'e') "Hello!"
Just(‘H’,“ello!”)
Nothing
We can also implement a cleaner version of our string
parser (found above):
-- Parse a specific pattern from the input string' :: String -> Parser String string' [] = return [] string' (c:xs) = do c1 <- char c rest <- string' xs return (c1:rest)
runParser (string' "Hello") "Hello" runParser (string' "Hello") "Foo"
Just("Hello", "")
Nothing
The do
notation makes it very easy to combine parsers! We now have some basic building blocks that we could use to implement more parsing combinators: choice
, many
, option
, etc. But I’ll leave it as an exercise.
Moreover, it would be interesting to implement an error reporting mechanism, as well as position tracking (to locate errors in the input), but I’ll leave it for another blog-post (or as an exercise for the reader!).
What I learned while reinventing the wheel
Implementing (very) basic parsing combinators led me to better understand the foundation of libraries like Parsec or Attoparsec , and to implement not so trivial instances of typeclasses like Applicatives and Monads . Although basic, I think it’s a good way to be more familiar with the DSL -like capabilities of Haskell, and to feel the power that the language offers in term of domain-specific modeling.
Recommend
-
68
-
30
引子 前不久在 CppCon 上看到一个 Talk: constexpr All the things ,这个演讲主题令我非常震撼,在编译期解析 json 字符串,进而提出了编译期构造正则表达式(编...
-
10
Optimizing a Parser Combinator into a memcpy posted by Craig Gidney on July 16, 2013 In this post: I describe how to dy...
-
9
ocaml试玩:写一个parser combinator2016-08-07体验了一把ocaml。好像是看知乎,推荐一门实用的函数式语言提到ocaml。haskell由于是"纯的"函数式,实用性比较差,而ocaml是允许命令式的,既可以体验函数式的精髓,又可以用来实际写东西。很多概...
-
14
C++元编程之Parser CombinatornetcanC++程序猿, 公众号:高级开发者原文地址:
-
10
再探 Parser 和 Parse Combinator软件开发话题下的优秀答主在几年前的文章《Policy Engine 的前世今生》里,我谈到了自己探索如何生成高效的表达式求值的工具的整个过程。我先是使...
-
3
Rust - Writing Parsers With nom Parser Combinator Framework I've been working on my new Rust side-project for several months now, and I've got some learnings to share. The project is cal...
-
12
正则表达式,PEG 以及 parser combinator2022-02-28上次年终总结中提到,cora 接下来计划的其中一个方向,考虑像 babashka 那样子做成日常的脚本来使用。 如果是当作日常脚本使用,其中很重要的一块是文本...
-
3
Code Snippet parser.fsx #!/usr/bin/env -S dotnet fsi --langversion:6.0 --mlcompatibility --optimize --warnaserror+:25,26 (* This construct is for ML compatibility. The syntax '(typ,...,typ)...
-
3
How to parse expression with the parser combinatorWriting parser is a boring and massive job, therefore, people create many ways like parser generators to reduce costs. One of them called parser combinator, parser combinator solves two pr...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK