33

Why D is a good choice for writing a language

 5 years ago
source link: https://www.tuicool.com/articles/hit/Ar2UJre
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

D is a programming language that has been appreciated for its powerful templates, its meta-programming features and parts of its standard library related to algorithms and ranges. At first glance nothing really related to writing compilers, which are often used and based upon OOP or at least structured programming.

D is a multi paradigm language and it allows to write in an OOP fashion even if this doesn't unleash its beauty. I'll explain why D is a viable solution, by using my experience with writing STYX as an example.

Designing the grammar with a PEG

In the past I had already written an awful scripting language, called LEOFUNS (Linear Evaluator Of FUnction Stacks). During this project is when I heard first of the tools and libraries used to design or automate the lexical stage of a compiler: BISON, FLEX, YACC, etc. At this time I only wrote in Object Pascal (the Delphi dialect to be more accurate). There was a YACC port, which I never managed to use.

Fortunately things have changed since and packrat parsers are born and now popularized. The library of D third party packages contains a PEG implementation called Pegged . Easy to use, well documented, nothing to do with the dusty tools mentioned before. It has allowed me to start writing the real parser by hand, while having a concise document as my plan, which I call the formal grammar .

IFz6BfN.jpg!weba PEG is used as reference document to maintain the parser written in D

The formal grammar itself can be tested in Pegged. The test consists of feeding an automatically generated parser with some source code. Then the AST is displayed in a web browser, allowing to check very easily if the grammar is broken.

zUVFFfR.jpg!webPegged produced HTML, allowing to see quickly if the grammar is broken

Once I started using the D library, Pegged, I didn't want to use a second language in the project, even if the grammar is separated from the compiler; as a sub project standing in its own folder.

Inline unit tests and coverage

The D language allows unit tests to be defined with their matching code . A simple compiler switch can be used to run them through features of the D runtime. The compiler's reflection system can also be used to run the tests in a more personalized way. A test is the equivalent of a free function but with a special syntax

// The most simple D unit test
unittest
{
    assert(true);    
}

In addition, the code can be instrumented to measure the coverage when the unittests are run. This way, many aspects of my compiler can be tested without a test suite based on external files (although this will be necessary in the future).

Since a compiler is about transforming source code, most of the unittests actually call a test function with at least a string representing of some STYX source code and then depending on the compiler feature that's tested, optional parameters, for example some other code when the test is about rewriting or formatting the AST. The test functions take advantage of D's special keywords that are __LINE__ and __FILE_FULL_PATH__ , to replace the default assert expression, allowing to get precise error messages, relative to the STYX code that's tested and not the compiler code. These messages are even clickable in the IDE widget dedicated to the compiler and target programs output streams.

M3quMz7.jpg!weba custom function to replace D 'assert'. Expansion of the special keywords allows precise error messages

3QraQna.jpg!webthe error message is a STYX error not a D error

Interaction with the editor is also important to reach 100% coverage. The tests for a specific compiler module (e.g lexer, parser, version processor, etc) can be run independently from the other modules because the compiler is also available as a static library. Note that STYX as a library is not compiled with the unittests otherwise each time all of them would be run.

faAnYrM.jpg!webinteraction with the editor is important in order to 'fight' until 100% coverage is reached by the tests

Dis handled by the most popular continuous integration services. Typically a combination of TravisCI + CodeCov is extremely easy to setup. This opens the field of pull requests with a guard against regressions.

While somewhat criticized because of its GC, D's default memory management system is not a problem in a compiler. Compilers are single-shot programs and memory management, in most cases, doesn't matter. If at some point it does, it will still be possible to turn it off before a phase and force a collection once finished, before starting another one. For now only new is used and we don't care if an instance is not used (typically after a parser error due to invalid code).

In short: so far so good even if it's true that STYX as a library is only used to run individual tests. More serious uses, I especially think to an auto-completion daemon, with an undefined running time, may reveal unexpected memory issues.

Future and conclusion

The project has reached a point where D testing facilities will be less used but they've been useful to get a solid foundation. In the future I'll expect that D's C FFI will be useful, for example for the backend. This time again I'll count on the third party packages, such as libfirm-d or LLVM-d.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK