

Comments About Shell, Awk, and Make
source link: http://www.oilshell.org/blog/2017/10/25.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Comments About Shell, Awk, and Make
Comments About Shell, Awk, and Make
A few forum comments I've made would have made good blog posts. But I declare blog bankruptcy again, so I'll just link to and summarize the comments here.
The comments about shell are immediately useful usage tips. The comments about Awk and Make are more abstract, with one exception.
I'll try to provide the main point inline, but click through to the comment if you'd like details.
Shell Usage Tips
(1) Explaining syntax of time. The time
construct in bash is part
of the language, not a shell builtin. It's more like a for
loop than cd
.
- Analogously: [ Is a Builtin, But [[ Is Part of the Language
- This is also true of the rarely used
coproc
andselect
keywords (e.g. seehelp select
in bash).
(2) Explaining syntax of find. find
is an external command, but
it's also an expression language with no lexer.
- In other words, it's is similar to the
test
/[
builtin, and its syntax has similar problems: Problems With the test Builtin: What Does -a Mean? - You can also think of
find
as a predicate/action language like Awk.
(3) help-bash: Awkward Behavior of Empty Arrays (September messages).
This long help thread is related to Thirteen Incorrect Ways and Two Awkward Ways to Use Arrays, where I talk about the copy and splice operations for arrays.
It's long and not very readable, but from it, I distilled an extended style guide for using arrays. Here is a list of valid operations:
- copy and splice, mentioned above
- iterate over the strings in an array
- split a string into an array, using an arbitrary delimiter
- join the elements of an array into a string, using an arbitrary delimiter
Using any other operation on arrays risks confusing them with strings.
Use set -u
/ set -o nounset
to avoid out of bounds access. However, there
is a bug fixed very recently, in bash 4.4: Empty arrays are confused with unset
variables.
(4) Grouping and Redirect Syntax in Shell
I explain some gotchas about shell syntax, and the semantics of >
and <
redirects.
Awk Language Design
(1) Comparing the Syntax and Semantics of Awk and JavaScript. They have surprisingly similar syntax, but different semantics.
Yet another way of putting it is that Awk is language with a function call stack, but no heap. This of course imposes severe restrictions on the language and its containers.
But if there's no heap, then you don't need garbage collection!
Addendum: I also realized that Awk can't express my solution to the Git
log in HTML problem. Python's useful re.sub()
API is
impossible in Awk, because it doesn't have first-class functions:
re.sub(
r"\x00(.*)\x00",
lambda match: cgi.escape(match.group(1)),
sys.stdin.read())
This indicates to me that Awk is stuck in the 1980's, but the model is useful enough that I still see lively discussions and new documents being written about it.
Make: Automatic Prerequisites and Language Design
(1) Simpler Automatic Prerequisites in GNU Make.
Make has the problem of extracting the dependency graph from C #include
statements.
My initial comment here was wrong — I wrote some code to convince myself
of that. I had been following the pattern in the GNU Make
Manual, which uses a gross piece of sed
to massage the output
of gcc -M
, writing a .d
file.
The commenters taught me something. I'm not convinced this is a great solution for future build tools to emulate, but it's worth thinking about.
The gcc -M
interface is also pretty maddening, and I've already forgotten the
details of it.
As far as I remember, this mad-scientist.net post eventually comes to the same conclusion, although the code there is long and intermingled with other concerns, like using an arbitrary output directory.
TODO: It would be nice to write up A Simpler Method for Automatic C Dependencies in GNU Make.
(2) .PHONY targets are a smell. In my opinion, Make should be treated as a dataflow language. Its purpose is to let you specify a partial order for incremental and parallel builds.
Shell is a better language for imperative actions. I mentioned the "argv
dispatch pattern", i.e. using "$@"
as the last line of your script. Almost
all of the shell scripts in the Oil repo use this pattern.
TODO: Write a blog post about it, and also mention the variant with better error checking:
case $action in
build|test|deploy) "$@" ;;
*) die "Invalid action ${action}" ;;
esac
(3) What are Make's weaknesses as a dataflow language?
OK, maybe Make is not actually what I want it to be. I think its evolution has been confused, much like the evolution of shell.
Make is not good for specifying dataflow because of:
- The multiple outputs problem. One commenter suggested the "obvious thing", which is wrong.
- Make doesn't consider the absence of a prerequisite to mean the target is out of date. It has odd special cases for "intermediate files", which doesn't compose.
- Metaprogramming the dependency graph is clumsy. (This isn't in the thread, but a recent Makefile I wrote for oilshell.org analytics drove this point home.)
The overall problem is that instead of thinking of make like a functional/parallel language, you end up "stepping through" it, like an imperative language.
(4) There are three Turing-complete languages in GNU Make: Make, Shell, and Guile Scheme.
You can write a Lisp in shell and make, and Guile Scheme is already a Lisp.
It's bad enough that when writing a Makefile
, you need to know two languages
simultaneously, as well the places where their syntax collides. (What does
$$
mean in Make? What does it mean in shell?)
But those two languages aren't expressive enough, so they added a third language!
Conclusion
I linked to observations I've made about shell, awk, and Make. If any of it was useful to you, let me know.
In the next post, I'll link to comments about programming language design and implementation. Depending on the feedback, I'll include more or fewer comments.
Recommend
-
46
强大的AWKAWK是一个优良的文本处理工具,Linux及Unix环境中现有的功能最强大的数据处理引擎之一。这种编程及数据操作语言(其名称得自于它的创始人阿尔佛雷德·艾侯、彼得·温伯格和布莱恩·柯林汉姓氏的首个字母)的最大功能取决于一个人所拥有的知识。awk经过改进生...
-
39
一、AWK介绍  Linux文本处理工具三剑客:grep、sed和AWK。其中grep是一种文本过滤工具,sed是文本行编辑器,而AWK是一种报表生成器,就是对文件进行格式化处理,这里的格式化不是文件系统的格式化,而是对文件内容进行各种“排版”,进而格式化显...
-
38
GoAWK: an AWK interpreter written in Go AWK is a fascinating text-processing language, and somehow after reading the delightfully-terse
-
42
--> awk and sed tutorials: awk: awk - Read and split file contents awk - Passing arguments or shell variables to awk awk - Match a pat...
-
9
Using the shell command in awk does not work advertisements I have the following code in shell. It does not work. So I don't know what'...
-
5
Discussion (4) Collapse Expand In short : awk (also wri...
-
11
Parsing and Validating Dates in Awk April 13, 2022 I recently stumbled on something that I thought would be easy: pars...
-
5
Languages don't enjoy long lives. Very few people still code with the legacies of the 1970s: ML, Pascal, Scheme, Smalltalk. (The C language is still widely used but in significantly updated versions.) Bucking that trend, the 1977 Unix utility Awk...
-
9
Shell...
-
10
How not to do this To appropriate a cliched quote: I didn’t fail a thousand times, I just discovered a thousand ways not to parse lots of data into an easily query-able format.
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK