Oil 0.7.pre9 and a Fast Shell Parser
source link: http://www.oilshell.org/blog/2019/12/09.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
This is the latest version of Oil, a compatible Unix shell and an experimental new language:
If you're new to the project, see Why Create a New Shell? and the2019 FAQ.
To build and run it, follow the instructions inINSTALL.txt. Please try it on your shell scripts and report bugs . I'm also looking for feedback on theOil language, which you can send through Github or Zulip .
Table of Contents
oil-native
Shows How We'll Optimize Oil
Changes and Contributors
0.7.pre6 on November 11th (changelog)
0.7.pre7 on December 2nd (changelog)
0.7.pre8 on December 6nd (changelog)
0.7.pre9 on December 8nd (changelog)
Appendix A: What happened to OPy, OVM, and OVM2?
Appendix B: Metrics for Release 0.7.pre9
Native Code and Bytecode Metrics
oil-native
Shows How We'll Optimize Oil
I've warned that Oil istoo slow because it's written in an abstract style with a focus on correctness.
This release takes a major step toward speeding it up. There's a new oil-native
tarball:
with a demo you can run in 10-20 seconds :
$ build/mycpp.sh tarball-demo ... -rwxrwxr-x 1 andy andy 487584 Dec 8 20:47 _bin/osh_parse.opt.stripped You can now run _bin/osh_parse.opt.stripped. Example: + _bin/osh_parse.opt.stripped -c 'echo "hello $name"' (command.Simple words: [ (compound_word parts:[ (word_part.Literal token:(Token id:Id.Lit_Chars val:echo span_id:0)) ]) (compound_word parts: [ ... ] ) ] )
What is this?
-
_bin/osh_parse
is Oil'sprincipled parser automatically translated to C++ . This listing of the tarball shows that it's pure C++. - To build and run it, you need only a C++ compiler and a shell.
- It producessyntax trees that are identical to those produced by the current parser. I verified this on ourwild corpus, which contains 1.2 million lines of shell.
- It's more than twenty times faster than the current Oil parser — according to the benchmarks published with every release for the last two years.
- It's 3-4x faster than thezsh parser, but 40-60% the speed of thebash parser. The Oil andzsh parsers both do a lot more work than thebash parser, e.g. for interactivity and error messages. More on this later.
-
It's a demo
, not a working shell. I'll continue to release it to show progress, but please keep using and testing the
oil
tarball, not theoil-native
one.
The next post will cover:
- A new toolmycpp, which I used to create this fast parser. It translates statically-typed Python to C++.
- Other C++ code generators likeASDL.
-
The six-step translation process. If you want details sooner, check out the many Zulip
threads on
#oil-dev
about it. - Details on performance.
Changes and Contributors
The rest of this post summarizes changes in the last four releases . Since the demo of theOil language inOctober, most work has been on translation, but there are also some features and bug fixes.
0.7.pre6 on November 11th (changelog)
Contributions:
-
Aaron Sokoloski
-
Fixed Unicode behavior in string operations that take single-character globs. For example,
${s#?}
and${x//?/char}
. This algorithm was tricky and I didn't know how to do it myself! -
Implemented
printf %d \'c
, which is shell's obscure syntax for theord(c)
function. TheOil language should simply useord(c)
.
-
Fixed Unicode behavior in string operations that take single-character globs. For example,
Other:
- Reorganize docs and improve the HTML toolchain. The/release/$VERSION/ page has been upgraded, and there's a newdocumentation index. The contents of most docs are still in progress.
-
Oil language
-
tup(42)
is a tuple with one element, instead of Python's confusing42,
(with trailing comma). Singleton tuples are rare.
-
-
Under the hood — I'll expand on these topics in the next blog post.
- Development of themycpp translator.
-
Development of the
mylib
runtime. - Development of the C++ target forASDL.
- Refactorings to aid translation.
- Add type annotations to certain files.
0.7.pre7 on December 2nd (changelog)
Contributions:
-
조성빈 added an
uninstall
script to undo whatinstall
does. - Aaron Sokoloski added type annotations to a few files. This is the first step to making code faster viamycpp.
Other:
-
Implemented bash's
${prefix*}
, which the homebrew package manager uses to unset variables starting with a certain prefix. I'd like more testing of important shell scripts along these lines. See the # should-run-this label on Github, and How To Test OSH . - Almost all other changes were related to translation andmycpp. It took two months of full-time work! But the results are encouraging.
-
I made the first
oil-native
tarball (linked above).
0.7.pre8 on December 6nd (changelog)
-
Oil now hasJSON support! This is natural because the language has Python/JavaScript-like data structures.
-
Builtyajl and its corresponding Python binding
py-yajl
into theapp bundle. -
oilshell/py-yajl
is a simplified fork of
py-yajl
, with just 862 lines of C, compared to the original's 1578 lines.
-
Builtyajl and its corresponding Python binding
-
Improvements to the
oil-native
release.
0.7.pre9 on December 8nd (changelog)
- Aaron Sokoloski added more type annotations. We need help, so please join the conversation on Zulip . We'll fill you in on how it works!
-
Oil language: Changed the "pretty print expression" command from
pp f(x)
to= f(x)
. It's like an assignment with nothing on the LHS. As in assignments, everything to the right of=
is parsed inexpression mode. -
The first word of a command can no longer look like
=foo
. Most likely you want to add a space like= foo
, or quote it like'=foo'
. -
Published benchmarks and metrics for the
oil-native
tarball. - Document JSON support.
What's Next?
The C++ translation isn't done, but the oil-native
demo has made me
optimistic that this large project is feasible
. After all, I wrote more
than two years ago that using CPython is the riskiest part of the
project
!
But now we have a concrete path forward. (Appendix A summarizes the path I tried and abandoned.) The parser is about 40% of the Oil codebase, and it took two months to translate. Given that, I expect the rest to take two to six months of continuous work to translate.
However, at the current rate, this will likely happen more than six months from now, because there are many other parts of the project.
- When I was working on translation, I wasn't working on features or bug fixes, for either OSH or Oil. According to the Appendix B, the code has changed only a little in the last two months.
- When I was working on documentation, I wasn't working on translation, features, or bug fixes. Writing good docs for Oil will also take several months of full-time work.
In short, it would be better if development was more parallel than serial . There are many independent parts of the project.
Help Wanted
The requests inHow to Help still stand:
-
Think of a feature
that would motivate you to use Oil. Users are more likely to become developers!
- Better interactive completion? Oil has had good interactive features for almost a year, but development on them has stalled while I work on other things.
- Speed? This is blocking me, and why I'm working on translation. But I feel like there should be a feature that motivates people to use a (temporarily) slow shell, since there are many popular apps that are slow.
- Try Oil on your shell scripts and report bugs. Scripts you've written yourself or know well are good candidates at first.
-
Try it interactively
and report roadblocks. I think a big hurdle to overcome is assembling a useful
oshrc
. -
If know Python and shell, consider submitting a patch
! You don't need to know any C++.
- The addition ofmycpp means that simple Python code can be sped up by an order of magnitude for free!
-
Ask us questions
on Zulip
! Or feel free to lurk too :-)
- I don't expect patches to drop out of thin air without help.
- Here's a thread about the code structure that Aaron started.
- New patches don't necessarily have to pass type checking. Tests are more important, and we can help you with those too.
I continue to maintain issue labels to help new contributors:
There are several important categories of work:
- # should-run-this lists shell scripts to test. Running important programs will make Oil appealing to more users.
- Improving the # osh-language .
- Improving the # oil-language .
- # feature . You can prototype them in plain Python!
- We have many # interactive-shell ideas, but not enough hands to implement them.
- # documentation .
-
# devtools
, like using Guix or Nix as a
virtualenv
for C and shell .-
I'll accept any PR that makes
build/dev.sh minimal
and thenbin/osh
run in an isolated environment. These commands generate Python source code, build a C extension, and run the resulting plain Python program. I got a couple PRs but it wasn't clear how to use them. (This is the "dev build", which is different than release build or themycpp build.)
-
I'll accept any PR that makes
Overall, I think oil-native
is evidence that Oil will work
. Try it and
let me know if you disagree!
An abstract shell interpreter can be turned into a production-quality shell. The shell will be compatible with old programs, but it also has powerful Python-like data structures and JSON support.
In other words, Oil is our upgrade path frombash .
If that appeals do you, consider helping out. The most important thing is to try it and figure out what's preventing you from being a user. Users are more likely to be developers!
Appendix A: What happened to OPy, OVM, and OVM2?
-
We're still using theOPy bytecode compiler for
oil
, but it will be retired when Oil becomes a pure C++ program viamycpp,ASDL, and other translators. -
OVM is our slice of CPython. We'll keep Python objects like
dict
,list
,tuple
,str
,int
,bool
, but we'll remove the bytecode interpreter in favor of native code. This solves the "double interpretation" problem. - As an experiment, I wrote 1000 lines of code towardOVM2, a minimal bytecode interpreter to run Oil. This approach was too much work for the small speed benefit. Relying onMyPy for type checking and the C++ compiler for code generation has produced much better results.
Appendix B: Metrics for Release 0.7.pre9
Let's compare the current release with version0.7.pre5, released two months ago on October 4th .
The parser benchmarks deserve their own post, so here are the metrics we routinely track.
Most of the development work was on translation , which doesn't affect spec tests:
- OSH spec tests for 0.7.pre5 : 1510 tests, 1345 passing, 58 failing
- OSH spec tests for 0.7.pre9 : 1536 tests, 1364 passing, 62 failing
There were a few new features like the ${prefix@}
implementation. And we add
failing tests to expose bad behavior, sometimes before we can fix it.
The Oil language now has JSON support:
- Oil spec tests for 0.7.pre5 : 207 tests, 195 passing, 12 failing
- Oil spec tests for 0.7.pre9 : 218 tests, 206 passing, 12 failing
Even though there were at least a hundred commits to aid translation, the source code didn't get much bigger:
- cloc for 0.7.pre5 : 13,925 lines of Python and C, 307 lines of ASDL.
- cloc for 0.7.pre9 : 14,156 lines of Python and C, 308 lines of ASDL.
Physical code:
- src for 0.7.pre5 : 25,785 lines of Python
- src for 0.7.pre9 : 26,313 lines of Python
Native Code and Bytecode Metrics
Theyajl dependency for JSON support added almost 5K lines of C code:
- nativedeps for 0.7.pre5 : 132,793 lines
- nativedeps for 0.7.pre9 : 137,145 lines
The native code size increased by a corresponding amount:
- ovm-build for 0.7.pre5 : 1,089,784 bytes of native code (under GCC)
- ovm-build for 0.7.pre9 : 1,126,968 bytes of native code (under GCC)
The bytecode shrunk a bit:
- src-bin-ratio-with-opy for 0.7.pre5 : 1,125,012 bytes
- src-bin-ratio-with-opy for 0.7.pre9 : 1,124,345 bytes
These are minor differences compared to the reductions thatmycpp should enable in the coming months. All bytecode will be replaced with native code .
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK