Dict-parser: Elm package to create a fast parser to match dictionary keys

Dict Parser

Create fast parsers to match dictionary keys.

Succeeds with the longest matching key. Is stack safe.

The problem

If you need a parser to match strings that you know beforehand you could use Parser.oneOf .

import Parser exposing (Parser, oneOf, backtrackable, token, getChompedString)

friendName : Parser String
friendName = 
	oneOf
		[ backtrackable <| token "joe"
		, backtrackable <| token "joey"
		, backtrackable <| token "john"
		]
		|> getChompedString

Now we can parse the name of our friends. However this parser has a few problems:

It is slow - It will always try all possible options regardless of how the parsed string looks like.
It is inefficient - Using oneOf with backtrackable is advised against . It means that we will be chomping the same characters over and over again.
Order matters - Small as it is, our example has a bug. It will never be able to parse joel as joe will always succeed first.

The solution

import Parser.Dict as DictParser

friendName : Parser String
friendName =
	[ ("joe", "joe")
	, ("joey", "joey")
	, ("john", "john")
	]
		|> Dict.fromList
		|> DictParser.fromDict

dict-parser organises the data in a Trie to create a parser that will match strings quickly and efficiently.

"j" 
              \
              "o"
             /  \
     (joe) "e"   "h" 
           /       \
  (joey) "y"        "n" (john)

In this example, if the first character being checked is not a j it will already fail the parsing.

Once we get past j and o we can match either e or h . We could try them in sequence, but instead we use a dictionary with the characters at that level, allowing this check to be very fast.

Stack safety

Great care has been taken to make sure that it doesn't matter how long your dictionary keys are, or how many of them you have, the parser will never overflow the stack.

You can read about the techniques used for that at Recursion Patterns - Getting rid of stack overflows

How fast is it?

Let's imagine that we are trying to match a word with 5 characters and we have 1000 words in our dictionary.

The time complexity of oneOf + backtrackable + token is of O(n * l) , where l is the length of the word being matched and n is the total number of words. In the worst case scenario our example would require 5000 comparisons with this approach.

The time complexity of using a Trie and matching the possible characters sequentially at each level is of O(n + l) . In the worst case scenario our example would require 1005 comparisons with this approach.

The time complexity is of this package's implementation is of O(l * log2(n / l)) . We use a Trie and with a Dictionary at each level to perform binary search. In the worst case scenario our example would require 39 comparisons with this approach.

Dict Parser

The problem

The solution

Stack safety

How fast is it?

Recommend

How KEEP-87 & Typeclasses will change the way we write Kotlin - QuickBird St...

GitHub - lepture/vim-jinja: jinja plugins for vim (syntax and indent)

iPhone XR2金色版渲染图曝光正面刘海小了/真香预警

翟欣欣再现世纪佳缘？官方回应：将该账号加入黑名单

Kaws x Uniqlo 联名T遭疯抢后背的营销学分析

Java 开发之 Lombok 必知必会

There's more in the latest Galaxy S10 update than we knew initially - S...

Juho Sarvikas on Twitter: "The wait is over! Keeping up with our promise of...

Ice universe on Twitter: "Under-display camera technology will take a long...

Google Pixel 3a Display Review - Mid-range with Incredible Color Accuracy

About Joyk