9

Oil 0.7.pre9 and a Fast Shell Parser

 4 years ago
source link: http://www.oilshell.org/blog/2019/12/09.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

This is the latest version of Oil, a compatible Unix shell and an experimental new language:

If you're new to the project, see Why Create a New Shell? and the2019 FAQ.

To build and run it, follow the instructions inINSTALL.txt. Please try it on your shell scripts and report bugs . I'm also looking for feedback on theOil language, which you can send through Github or Zulip .

Table of Contents

oil-native Shows How We'll Optimize Oil

Changes and Contributors

0.7.pre6 on November 11th (changelog)

0.7.pre7 on December 2nd (changelog)

0.7.pre8 on December 6nd (changelog)

0.7.pre9 on December 8nd (changelog)

Appendix A: What happened to OPy, OVM, and OVM2?

Appendix B: Metrics for Release 0.7.pre9

Native Code and Bytecode Metrics

oil-native Shows How We'll Optimize Oil

I've warned that Oil istoo slow because it's written in an abstract style with a focus on correctness.

This release takes a major step toward speeding it up. There's a new oil-native tarball:

with a demo you can run in 10-20 seconds :

$ build/mycpp.sh tarball-demo
...
-rwxrwxr-x 1 andy andy 487584 Dec  8 20:47 _bin/osh_parse.opt.stripped

You can now run _bin/osh_parse.opt.stripped.  Example:

+ _bin/osh_parse.opt.stripped -c 'echo "hello $name"'
(command.Simple
  words: [
    (compound_word parts:[
      (word_part.Literal token:(Token id:Id.Lit_Chars val:echo span_id:0))
    ])
    (compound_word
      parts: [
        ...
      ]
    )
  ]
)

What is this?

  • _bin/osh_parse is Oil'sprincipled parser automatically translated to C++ . This listing of the tarball shows that it's pure C++.
  • To build and run it, you need only a C++ compiler and a shell.
  • It producessyntax trees that are identical to those produced by the current parser. I verified this on ourwild corpus, which contains 1.2 million lines of shell.
  • It's more than twenty times faster than the current Oil parser — according to the benchmarks published with every release for the last two years.
  • It's 3-4x faster than thezsh parser, but 40-60% the speed of thebash parser. The Oil andzsh parsers both do a lot more work than thebash parser, e.g. for interactivity and error messages. More on this later.
  • It's a demo , not a working shell. I'll continue to release it to show progress, but please keep using and testing the oil tarball, not the oil-native one.

The next post will cover:

  • A new toolmycpp, which I used to create this fast parser. It translates statically-typed Python to C++.
  • Other C++ code generators likeASDL.
  • The six-step translation process. If you want details sooner, check out the many Zulip threads on #oil-dev about it.
  • Details on performance.

Changes and Contributors

The rest of this post summarizes changes in the last four releases . Since the demo of theOil language inOctober, most work has been on translation, but there are also some features and bug fixes.

0.7.pre6 on November 11th (changelog)

Contributions:

  • Aaron Sokoloski
    • Fixed Unicode behavior in string operations that take single-character globs. For example, ${s#?} and ${x//?/char} . This algorithm was tricky and I didn't know how to do it myself!
    • Implemented printf %d \'c , which is shell's obscure syntax for the ord(c) function. TheOil language should simply use ord(c) .

Other:

  • Reorganize docs and improve the HTML toolchain. The/release/$VERSION/ page has been upgraded, and there's a newdocumentation index. The contents of most docs are still in progress.
  • Oil language
    • tup(42) is a tuple with one element, instead of Python's confusing 42, (with trailing comma). Singleton tuples are rare.
  • Under the hood — I'll expand on these topics in the next blog post.
    • Development of themycpp translator.
    • Development of the mylib runtime.
    • Development of the C++ target forASDL.
    • Refactorings to aid translation.
    • Add type annotations to certain files.

0.7.pre7 on December 2nd (changelog)

Contributions:

  • 조성빈 added an uninstall script to undo what install does.
  • Aaron Sokoloski added type annotations to a few files. This is the first step to making code faster viamycpp.

Other:

  • Implemented bash's ${prefix*} , which the homebrew package manager uses to unset variables starting with a certain prefix. I'd like more testing of important shell scripts along these lines. See the # should-run-this label on Github, and How To Test OSH .
  • Almost all other changes were related to translation andmycpp. It took two months of full-time work! But the results are encouraging.
  • I made the first oil-native tarball (linked above).

0.7.pre8 on December 6nd (changelog)

  • Oil now hasJSON support! This is natural because the language has Python/JavaScript-like data structures.
    • Builtyajl and its corresponding Python binding py-yajl into theapp bundle.
    • oilshell/py-yajl is a simplified fork of py-yajl , with just 862 lines of C, compared to the original's 1578 lines.
  • Improvements to the oil-native release.

0.7.pre9 on December 8nd (changelog)

  • Aaron Sokoloski added more type annotations. We need help, so please join the conversation on Zulip . We'll fill you in on how it works!
  • Oil language: Changed the "pretty print expression" command from pp f(x) to = f(x) . It's like an assignment with nothing on the LHS. As in assignments, everything to the right of = is parsed inexpression mode.
  • The first word of a command can no longer look like =foo . Most likely you want to add a space like = foo , or quote it like '=foo' .
  • Published benchmarks and metrics for the oil-native tarball.
  • Document JSON support.

What's Next?

The C++ translation isn't done, but the oil-native demo has made me optimistic that this large project is feasible . After all, I wrote more than two years ago that using CPython is the riskiest part of the project !

But now we have a concrete path forward. (Appendix A summarizes the path I tried and abandoned.) The parser is about 40% of the Oil codebase, and it took two months to translate. Given that, I expect the rest to take two to six months of continuous work to translate.

However, at the current rate, this will likely happen more than six months from now, because there are many other parts of the project.

  • When I was working on translation, I wasn't working on features or bug fixes, for either OSH or Oil. According to the Appendix B, the code has changed only a little in the last two months.
  • When I was working on documentation, I wasn't working on translation, features, or bug fixes. Writing good docs for Oil will also take several months of full-time work.

In short, it would be better if development was more parallel than serial . There are many independent parts of the project.

Help Wanted

The requests inHow to Help still stand:

  • Think of a feature that would motivate you to use Oil. Users are more likely to become developers!
    • Better interactive completion? Oil has had good interactive features for almost a year, but development on them has stalled while I work on other things.
    • Speed? This is blocking me, and why I'm working on translation. But I feel like there should be a feature that motivates people to use a (temporarily) slow shell, since there are many popular apps that are slow.
  • Try Oil on your shell scripts and report bugs. Scripts you've written yourself or know well are good candidates at first.
  • Try it interactively and report roadblocks. I think a big hurdle to overcome is assembling a useful oshrc .
  • If know Python and shell, consider submitting a patch ! You don't need to know any C++.
    • The addition ofmycpp means that simple Python code can be sped up by an order of magnitude for free!
  • Ask us questions on Zulip ! Or feel free to lurk too :-)
    • I don't expect patches to drop out of thin air without help.
    • Here's a thread about the code structure that Aaron started.
    • New patches don't necessarily have to pass type checking. Tests are more important, and we can help you with those too.

I continue to maintain issue labels to help new contributors:

There are several important categories of work:

  • # should-run-this lists shell scripts to test. Running important programs will make Oil appealing to more users.
  • Improving the # osh-language .
  • Improving the # oil-language .
  • # feature . You can prototype them in plain Python!
  • We have many # interactive-shell ideas, but not enough hands to implement them.
  • # documentation .
  • # devtools , like using Guix or Nix as a virtualenv for C and shell .
    • I'll accept any PR that makes build/dev.sh minimal and then bin/osh run in an isolated environment. These commands generate Python source code, build a C extension, and run the resulting plain Python program. I got a couple PRs but it wasn't clear how to use them. (This is the "dev build", which is different than release build or themycpp build.)

Overall, I think oil-native is evidence that Oil will work . Try it and let me know if you disagree!

An abstract shell interpreter can be turned into a production-quality shell. The shell will be compatible with old programs, but it also has powerful Python-like data structures and JSON support.

In other words, Oil is our upgrade path frombash .

If that appeals do you, consider helping out. The most important thing is to try it and figure out what's preventing you from being a user. Users are more likely to be developers!

Appendix A: What happened to OPy, OVM, and OVM2?

  • We're still using theOPy bytecode compiler for oil , but it will be retired when Oil becomes a pure C++ program viamycpp,ASDL, and other translators.
  • OVM is our slice of CPython. We'll keep Python objects like dict , list , tuple , str , int , bool , but we'll remove the bytecode interpreter in favor of native code. This solves the "double interpretation" problem.
  • As an experiment, I wrote 1000 lines of code towardOVM2, a minimal bytecode interpreter to run Oil. This approach was too much work for the small speed benefit. Relying onMyPy for type checking and the C++ compiler for code generation has produced much better results.

Appendix B: Metrics for Release 0.7.pre9

Let's compare the current release with version0.7.pre5, released two months ago on October 4th .

The parser benchmarks deserve their own post, so here are the metrics we routinely track.

Most of the development work was on translation , which doesn't affect spec tests:

There were a few new features like the ${prefix@} implementation. And we add failing tests to expose bad behavior, sometimes before we can fix it.

The Oil language now has JSON support:

Even though there were at least a hundred commits to aid translation, the source code didn't get much bigger:

Physical code:

Native Code and Bytecode Metrics

Theyajl dependency for JSON support added almost 5K lines of C code:

The native code size increased by a corresponding amount:

The bytecode shrunk a bit:

These are minor differences compared to the reductions thatmycpp should enable in the coming months. All bytecode will be replaced with native code .


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK