2

Parser generators vs. handwritten parsers: surveying major language implementati...

 2 years ago
source link: https://notes.eatonphil.com/parser-generators-vs-handwritten-parsers-survey-2021.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Developers often think parser generators are the sole legit way to build programming language frontends, possibly because compiler courses in university teach lex/yacc variants. But do any modern programming languages actually use parser generators anymore?

To find out, this post presents a non-definitive survey of the parsing techniques used by various major programming language implementations.

CPython: PEG parser

Until CPython 3.10 (which hasn't been released yet) the default parser was handwritten. The team thought the PEG parser was a better fit for expressing the language, and at the time took a reported 10% speed and memory usage hit switching off the handwritten parser.

The PEG grammar is defined here. (It is getting renamed in 3.10 though so check the directory for a file of a similar name if you browse 3.10+).

GCC: Handwritten

Source code for the C parser available here. It used to use Bison until GCC 4.1 in 2006. The C++ parser also switched from Bison to a handwritten parser 2 years earlier.

Clang: Handwritten

Not only handwritten but the same file handles parsing C, Objective-C and C++. Source code is available here.

Ruby: YACC-like Parser Generator

Ruby includes a YACC-like parser generator called racc. The grammar for the language can be found here.

V8 JavaScript: Handwritten

Source code available here.

Zend Engine PHP: Parser Generator

Source code available here.

TypeScript: Handwritten

Source code available here.

Bash: Yacc-based parser generator

Source code for the grammar is available here.

Chromium CSS Parser: Handwritten

Source code available here.

OpenJDK: Handwritten

You can find the source code here. Some commentary calls this implementation fragile.

Golang: Handwritten

Until Go 1.6 the compiler used a yacc-based parser. The source code for that grammar is available here.

In Go 1.6 they switched to a handwritten parser. You can find that change here. There was a reported 18% speed increase when parsing files and a reported 3% speed increase in building the compiler itself when switching.

You can find the source code for the compiler's parser here.

Roslyn: Handwritten

The C# parser source code is available here. The Visual Basic parser source code is here.

Lua: Handwritten

Source code available here.

Swift: Handwritten

Source code available here.

R: ???

Not sure, I had a hard time reading its source. If you can point me at it's source code that would be great!

Julia: Handwritten ... in Scheme

Julia's parser is handwritten but not in Julia. It's in Scheme! Source code available here.

PostgreSQL: Yacc-based Parser Generator

PostgreSQL uses Bison for parsing queries. Source code for the grammar available here.

MySQL: Yacc Parser Generator

Source code for the grammar available here.

SQLite: Yacc-based Parser Generator

SQLite uses its own parser generator called Lemon. Source code for the grammary is available here.

Summary

Of the 2021 Redmonk top 10 languages, 9 of them have a handwritten parser. Ruby is the only one that does not. (I'm not counting Python since 3.10 still hasn't been released.)

Although parser generators are still used in major language implementations, maybe it's time for universities to start teaching handwritten parsing?


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK