4

Moreutils: A collection of Unix tools that nobody thought to write long ago

 2 years ago
source link: https://news.ycombinator.com/item?id=31043655
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Moreutils: A collection of Unix tools that nobody thought to write long ago

I find sponge really useful. Have you ever wanted to process a file through a pipeline and write the output back to the file? If you do it the naive way, it won’t work, and you’ll end up losing the contents of your file:
    awk '{do_stuff()}' myfile.txt | sort -u | column --table > myfile.txt
The shell will truncate myfile.txt before awk (or whatever command) has a chance to read it. So you can use sponge instead, which waits until it reads EOF to truncate/write the file.
    awk '{do_stuff()}' myfile.txt | sort -u | column --table | sponge myfile.txt
s.gif
"Have you ever wanted to process a file through a pipeline and write the output back to the file?"

Personally, no. I prefer to output text processsing results to a different file, maybe a temporary one, then replace the first file with the second file after looking at the second file.

  awk '{do_stuff()}' 1.txt | sort -u | column --table > 2.txt
  less 2.txt
  mv 2.txt 1.txt
With this sponge program I cannot check the second file before replacing the first one. The only reason I can think of not to use a second file is if I did not have enough space for the second file. In that case, I would use a fifo as the second file. (I would still look at the fifo first before overwriting the first file.)
   mkfifo 1.fifo
   awk '{do_stuff()}' 1.txt | sort -u | column --table > 1.fifo &
   less -f 1.fifo
   awk '{do_stuff()}' 1.txt | sort -u | column --table > 1.fifo &
   cat 1.fifo > 1.txt
s.gif
You’re implying that all uses of `sponge` are interactive with a user actively involved.

I used sponge a lot in scripts where I’ve already checked and validated its behavior. When I’m confident it works I can keep just use sponge for simplicity.

s.gif
Perhaps the backups are a separate process, run beforehand or after this process once several changes (perhaps to several files) have been made, so not backing up intermediate states?
s.gif
Real programmers throw their source code out onto the internet.
s.gif
Backups should be handled by an external process, otherwise, your script also needs to take into account: - Number of backups to retain and cleaning up old versions - Monitoring of backups. What if backing up fails? - The script itself can tamper with the backups unintentionally?

Speaking from (bad) experience

s.gif
Yeah they should have called that tool yolocat
s.gif
If 2.txt is tracked by git, there’s no need to go through these hoops. Then, sponge starts to make sense. Except from that case, I agree totally.
s.gif
Did git exist when this sponge program was written?
s.gif
sponge as a shell script for non-interactive use, i.e., where we do not need to check the second file
   #!/bin/sh
   test $# = 1||exec echo usage: $0 file;
   test -f $1||exec echo $1 does not exist;
   test ! -f $1.tmp||exec echo $1.tmp exists;
   cat > $1.tmp;
   mv $1.tmp $1;
s.gif
The script should probably be using `mktemp` instead of potentially overwriting uninvolved files (even if they have `.tmp` in the name).
s.gif
When you do, just make sure you use the `--directory` option of `mktemp` to use the same directory as the final file. This makes sure both files are on the same filesystem, so `mv` is atomic (and faster).
s.gif
And so I learned of mktemp today.

Just was intrigued, googled it, read about it and know it solves a problem I had elegantly. Some new tool in my belt from now on.

Thank you.

s.gif
For a simple case like this I'd probably just go with something like tmp.$$.tmp.
s.gif
You might need a -f on that final mv.

While you're at it, may as well create a backup of the $1 file.

s.gif
sponge creates no backups

Never have I ever needed -f.

s.gif
I can see how this is handy, but it's also dangerous and likely to bite you in the ass more than not. I think sponge is great, but I think your example is dangerous. If you make a mistake along the way, unless it's a fatal error, you're going to lose your source data. Typo a grep and you're screwed.
s.gif
The technique I found works well is to edit the file in vim and use the !G (process lines 1-N in shell) (or use emacs in a similar way). Gives you infinite undo and redo until you get the commands right. Then you can view the history and make a shell script like this using sponge. For example, edit a file, go to line 1 and type: !Gsort File is run though sort and results replace the buffer. To undo, use ‘u’, to redo, use CTRL-r.
s.gif
Huh, never heard of `!G`. How is it different from `%!sort`?
s.gif
! is also a normal mode command/operator. It accepts a motion and then drops you into the command line with :{range}! pre-filled, where {range} is the range of lines covered by the motion. !G in normal mode is exactly equivalent to :.,$!
s.gif
Why are we still using filesystems that don't have an "undo" operation, though?

Is the work of programmers less valuable than, say, the work of Google Docs users (where there is an undo operation)?

s.gif
That's kind of how I use git. I would never use "sponge" or "sed -i" outside of a git repo or with files that haven't been checked in already.

I agree it would be nice to have this in the filesystem; some filesystems support this (e.g. NILFS[1]), but none of the current "mainstream" ones do AFAIK. In the meanwhile, git works well enough.

[1]: https://nilfs.sourceforge.io/en/

s.gif
Which systems don't have snapshotting?

Mac has Time Machine.

s.gif
Does it take a snapshot after every shell command?
s.gif
This is not a file system I would be interested in. If you’ve ever snooped in on fs activity it is constant and overwhelming on even an average system. IDEs can have undo, vi and emacs have undo. As others in the thread have said, just use multiple files.

Personally I’d be interested in a shell having undo capability, but not a file system.

s.gif
> This is not a file system I would be interested in. If you’ve ever snooped in on fs activity it is constant and overwhelming on even an average system.

I'm not sure how these sentences are connected. Are you implying that allowing undo would make those problems significantly worse? I'm not sure of that. If you have a CoW filesystem, which you probably want for other reasons, then having a continuous snapshot mode for recent activity would not need much overhead.

If you're saying there's too much activity to allow an undo, well, I assume the undo would be scoped to specific files or directories!

s.gif
Snapshotting is different. Undo/redo is not a contingency plan; instead it’s something that you make a regular part of your workflow.
s.gif
Been there done that. Good advice.

In my case I could recreate the original file but it took 90 minutes to scrape the remote API to do it again...

s.gif
Right. You should always test the command first. If the data is critical, use a temporary file instead. I usually use this in scripts so I don’t have to deal with cleanup.
s.gif
> If the data is critical, use a temporary file instead

Use a temporary file always. Sponge process may be interrupted, and you end up with a half-complete /etc/passwd in return.

s.gif
Couldn't `mv` or `cp` from the temp file to `/etc/passwd` be interrupted as well? I think the only way to do it atomically is a temporary file on the same filesystem as `/etc`, followed by a rename. On most systems `/tmp` will be a different filesystem from `/etc`.
s.gif
mv can't, or, more correctly the rename system call can not.

rename is an atomic operation from any modern filesystem's perspective, you're not writing new data, you're simply changing the name of the existing file, it either succeeds or fails.

Keep in mind that if you're doing this, mv (the command line tool) as opposed to the `rename` system call, falls back to copying if the source and destination files are on different filesystems since you can not really mv a file across filesystems!

In order to have truly atomic writes you need to:

open a new file on the same filesystem as your destination file

write contents

call fsync

call rename

call sync (if you care about the file rename itself never being reverted).

This is some very naive golang code (from when I barely knew golang) for doing this which has been running in production since I wrote it without a single issue: https://github.com/AdamJacobMuller/atomicxt/blob/master/file...

s.gif
Not clear on need for fsync and sync.

Are those for networked like NTFS or just as security against crashes.

Logically on a single system there would be no effect assuming error free filesystem operation. Unless I'm missing something.

s.gif
Without the fsync() before rename(), on system crash, you can end up with the rename having been executed but the data of the new file not yet written to stable storage, losing the data.

ext4 on Linux (since 2009) special-cases rename() when overwriting an existing file so that it works safely even without fsync() (https://lwn.net/Articles/322823/), but that is not guaranteed by all other implementations and filesystems.

The sync() at the end is indeed not needed for the atomicity, it just allows you to know that after its completion the rename will not "roll back" anymore on a crash. IIRC you can also use fsync() on the parent directory to achieve this, avoiding sync() / syncfs().

s.gif
> ext4 on Linux (since 2009) special-cases rename

This is interesting.

The linked git entry (https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.g...) from the LWN article says "Notice: this object is not reachable from any branch."

Did this never get merged because I definitely saw this issue in production well after 2009.

I guess it either got changed, or, a different patch applied but perhaps this https://github.com/torvalds/linux/blob/master/fs/ext4/namei.... does it?

s.gif
The patch just got rebased, here's the one that was actually applied in master for v2.6.30: https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.g...

And yes, the code you highlighted is exactly this special-case in its current form. The mount option "noauto_da_alloc" can be used to disable these software-not-calling-fsync safety features.

s.gif
I'd like to know why as well. The inclusion of the fsync before the rename implies to me that the filesystem isn't expected to preserve order between write and rename. It could commit a rename before committing _past_ writes, which could leave your /etc/passwd broken after an outage at a certain time. I can't tell whether that's the case or not from cursory googling (everybody just talks about read-after-write consistency). Maybe it varies by filesystem?

The final sync is just there for durability, not atomicity, like you say.

s.gif
> the filesystem isn't expected to preserve order between write and rename

Correct.

The rename can succeed while the write of the file you just renamed gets rolled back.

s.gif
You can use `/etc/passwd.new` as a temporary file to avoid the problems you mentioned. In the worst case, you'll have an orphaned passwd.new file, but /etc/passwd is guaranteed to remain intact.
s.gif
Probably not. If it's implemented responsibly, it will internally:

1. Write to a temporary file 2. Do the equivalent of mv tmpfile originalfile

so it will either succeed or do nothing

s.gif
"Responsibly" is subjective here. I could argue that responsible thing to do is to use as little resources as possible, and in that case, directly overwriting the file would be the "responsible" thing to do.
s.gif
  cmd < somefile | somefile.tmp && mv somefile.tmp somefile
Will read from somefile, and only replace the source if (and when) the pipleline exits successfully.

Mind that this may still bite in interesting ways. But less frequently.

You can also invoke tempfile(1) which is strongly recommended in scripts.

https://www.unix.com/man-page/Linux/1/tempfile/

s.gif
I was wondering what the difference between tempfile and mktemp was. At the bottom of the tempfile man page, it says:

> tempfile is deprecated; you should use mktemp(1) instead.

s.gif
I ... need to revisit that. Though I suspect you're right.
s.gif
This is why I usually just use a temporary directory and do a quick
    git init .
    git add .
    git commit -m "wip"
... and proceed from there. So many ways to screw up ad hoc data processing using shell and the above can be a life saver. (Along with committing along the way, ofc.)

EDIT: Doesn't work if you have huuuuge files, obviously... but you should perhaps be using different tools for that anyway.

s.gif
I guess if you want something single-file that resembles git (now thinking better, not sure if a requirement at all), you can also try Fossil ( https://www2.fossil-scm.org ).
s.gif
This is why I read HN. You never know when a brilliant idea will appear. Thank you. I never thought of doing this for temporary work
s.gif
Agree.

Sure there could be some situations where it could be handy, like in some auomated scenarios, but most of the time it is not a big deal to write

    foo data1 data2
    bar data2
s.gif
It would also be useful in cases when the file must be written with root privileges (but the shell is run as a normal user)

Now you have to use `tee` for that, that is fine but if you don't want to echo the file back to the terminal you have to do

    command | sudo tee file > /dev/null
with this you can simply do
    command | sudo sponge file
s.gif
You cannot use `sudo cat` to open a file with root privileges, because `sudo cat > foo` means "open file foo with your current privileges, then run `sudo cat` passing the file to it", and the whole root thing only happens after you already tried and failed to open the file.

I seem to get this wrong roughly once a week.

s.gif
I knew a person who would give something like this as a sysadmin / devops interview question. It was framed as "'sudo cat >/etc/foo' is giving a permission denied error! what's wrong?" Usually the interview candidate would go off on a tangent...
s.gif
Will shells or posix standard ever evolve instead of being the obvious gotchas in interviews?
s.gif
If Windows can evolve (PowerShell...), so can Linux/Unix.
s.gif
Cat won’t write the file. If you mean `command | sudo cat > file.txt`, that won’t work because the redirection is still happening in the non-root shell. You could do `command | sudo sh -c "cat > file.txt"` but that’s rather verbose.
s.gif
I thought the same thing. What's wrong with this approach?
  cat myfile.txt | awk '{do_stuff()}' | sort -u | column --table > new-myfile.txt
s.gif
cat is doing nothing useful here. Your command is equivalent to
    awk '{do_stuff()}' myfile.txt | sort -u | column --table > new-myfile.txt
The only thing you’ve changed is that you’re sending the output to a new file. That’s fine, but it’s what sponge is avoiding.
s.gif
Are there any legitimate reasons to have a particular file both as an input to a pipe and as an output? I wonder whether a shell could automatically “sponge” the pipe’s output if it detected that happening.
s.gif
The shell can't detect that unless the file is also used as input via `<`. So it couldn't do that in HellsMaddy's example, since the filename is given as an arg to awk instead of being connected to its stdin.
s.gif
And detecting that would actually implement move semantics in the shell.

Isn’t there already a project to bring Rust to the shell?

s.gif
sed --in-place has one file as both input and output. It's not really different from any pipe of commands where the input and output files are the same. But sed also makes a copy of the file before overwriting it - per default.
s.gif
Yeah, it seems like the kind of command that you only need because of a quirk in how the underlying system happens to work. Not something that should pollute the logic of the command, imo. I would expect a copy-on-write filesystem to be able to do this automatically for free.
s.gif
> I would expect a copy-on-write filesystem to be able to do this automatically for free.

this is an artifact of how handles work (in relation to concurrency), not the filesystem.

copy-on-write still guarantees a consistent view of the data, so if you write on one handle you're going to clobber the data on the other, because that's what's in the file.

what you really want is an operator which says "I want this handle to point to the original snapshot of this data even if it's changed in the meantime", which a CoW filesystem could do, but you'd need some additional semantics here (different access-mode flag?) which isn't trivially granted just by using a CoW filesystem underneath.

s.gif
Do people really use copy-on-write filesystems though? I mean it'd be great if that were a default, but I rarely encounter them, and when I do, it's only because someone intentionally set it up that way. In 30+ years of using Unix systems, I can't even definitively recall one of them having a copy-on-write filesystem in place. Which is insane considering I used VAX/VMS systems before that and it was standard there.
s.gif
FreeBSD uses ZFS by default, which is copy on write post-snapshot.
s.gif
I thought UFS was still the default with ZFS being a supported alternative option for root filesystems?
s.gif
That's what I get for living in Linuxland for so long. BSDs tend to be so much smarter, but I'm stuck with Linux inertia.
s.gif
Huh? Btrfs is copy on write and it's definitely being used.
s.gif
>I find sponge really useful.

When I found `sponge`, I couldn't help but wonder where it had been all of my life. It's nice to be able to modify an in-place file with something other than `sed`.

s.gif
I usually used tac | tac for a stupid way of pausing in a pipeline. Though it doesn’t work in this case. A typical use is if you want to watch the output of something slow-to-run changing but watch doesn’t work for some reason, eg:
  while : ; do
    ( tput reset
      run-slow-command ) | tac | tac
  done
s.gif
`cat` will begin output immediately. `tac` buffers the entire input in order to reverse the line order before printing it. Piping through tac ensures EOF is reached, while piping through tac a second time puts the lines back in order
s.gif
The easy way to do this is with 'tee /dev/tty' in the middle.
s.gif
I don’t understand why that would work? The goal is to buffer the output of the commands and then send it all at once.
s.gif
protip: don't run this
    awk '{do_stuff()}' myfile.txt | sort -u | column --table | sponge > myfile.txt
s.gif
'sort -o file' does the same thing but I like the generic 'sponge'. Not sure why it's a binary, since it's basically this shell fragment (if you don't bother checking for errors etc):

(cat > $OUT.tmp; mv -f $OUT.tmp $OUT)

Hmmm ... "When possible, sponge creates or updates the output file atomically by renaming a temp file into place. (This cannot be done if TMPDIR is not in the same filesystem.)"

My shell fragment already beats sponge on this feature!

s.gif
It would be nice to update to use anonymous files where supported (Linux does). This allows you to open an unnamed file in any directory so that you can do exactly this, write to it then "rename" it over another file atomically.
s.gif
If you're using awk anyway:
  awk 'BEGIN{system("rm myfile.txt")}{do_stuff()}' <myfile.txt | sort -u | column --table > myfile.txt
In general though:
  <foo { rm -f foo && wc >foo }
s.gif
> Have you ever wanted to process a file through a pipeline and write the output back to the file?

I don't think overwriting the input data is a good idea due to risk of data loss.

s.gif
Never heard of sponge. Isn’t it just the same as tee ?
s.gif
I urge you to actually read the link before commenting. sponge is the front and center example of the link you are commenting on.
s.gif
The point is not to truncate the file immediately when it is open for output, and before the input side has had time to slurp it.
s.gif
It’s part of the moreutils collection and is better considered a buffered redirect than anything similar to `tee`.
s.gif
Ah yes I glanced too quickly over the surface here. It does look more like redirection. Will have to look at it more. Appreciate your helpful response vs the downvoters and the one unhelpful/snarky response.
s.gif
The downvoting here is the equivalent of getting shamed for “asking a stupid question at work.”

Yes I should have done a bit more homework but shaming for asking a clarifying question is unreasonable. Those of you who have the downvote trigger-finger can and should do better.

s.gif
I wouldn't overinterpret the downvotes - it's impossible to know what people were thinking, and the mind tends to arrive at the most irritating, annoying, or hurtful explanation.

The same principle works the other way too - when a comment doesn't contain much information, readers tend to interpret it according to whatever they personally find the most irritating, annoying, or hurtful, and then react to that. Our minds are not our best friends this way.

The (partial) solution to this is to include enough disambiguating information in your comment. For example if your comment had contained enough information to make clear that your question was genuinely curious rather than snarkily dismissive, I doubt it would have gotten downvoted.

It's hard to do that because generally our intention is so clear and transparent to ourselves that it doesn't occur to us to include it in the message. Unfortunately for all of us on the internet, however, intent doesn't communicate itself.

s.gif
This is very helpful feedback and does make a lot of sense. Appreciate it.
s.gif
I agree with you, the downvotes are unnecessary, it was actually a good question.

tee actually does sorta work for this sometimes, but it’s not guaranteed to wait until EOF. For example I tested with a 10 line file where I ran `sort -u file.txt | tee file.txt` and it worked fine. But I then tried a large json file `jq . large.json | tee large.json` and the file was truncated before jq finished reading it.

s.gif
This was such a footgun! This may be fairly intuitive if you know how shell redirection is implemented. But hard to think of that during the time you write a command.
The one I use the most is `vipe`.

> vipe: insert a text editor into a pipe

It's useful when you want to edit some input text before passing it to a different function. For example, if I want to delete many of my git branches (but not all my git branches):

  $ git branch | vipe | xargs git branch -D
`vipe` will let me remove some of the branch names from the list, before they get passed to the delete command.
s.gif
I mostly rely on fzf for stuff like this nowadays. You can replace vipe with fzf —multi for example and get non-inverted behavior with fuzzy search.

More to it, not in a pipe (because of poor ^C behavior), but using a hokey in zsh to bring up git branch | fzf, select any number of branches I need and put them on command line, this is extremely composable.

s.gif
I vaguely recall in ~2010 coming across a Plan 9 manpage(?) that seemed to imply that Emacs could work that way in a pipe (in reference to complex/bloated tools on non-Plan 9 systems), but that wasn't true of any version of Emacs I'd ever used.
s.gif
Yep, set EDITOR to emacsclient and vipe will use emacs.
s.gif
I mean that it seemed to suggest that you could pipe directly to Emacs, without using vipe.
s.gif
And what if you decide mid-vipe "oh crap I don't want to do this anymore"? In the case of branches to delete you could just delete every line. In other cases maybe not?
s.gif
Exiting vim with :cq is also handy for backing out of git commits.
s.gif
I use the mnemonic "cancel-quit" for exactly that reason.
s.gif
Pressing ZQ in insert mode also provides the same effect!
s.gif
In normal mode, not insert mode. ZQ in insert mode inserts ZQ.
s.gif
Oh my bad, i had a bit of a stroke writing my comment lol.
s.gif
Actually, it doesn’t. ZQ is the same as :q! which quits without saving with a 0 exit code. So all of your git branches get deleted in this example, since you left the file as it was. You definitely want :cq here.
s.gif
To be fair, that scenario would be just as bad or worse without vipe.

Also, you can construct your pipeline so that a blank file (or some sentinel value) returned from vipe means “abort”. A good example of this is when git opens your editor for interactive merging — deleting all lines cancels the whole thing.

s.gif
Yeah you could just as well say "oh crap I didn't mean to do that" after finishing a non-interactive command. However, at least knowing my own lazy tendencies, I could imagine feeling comfortable hitting <enter> on this command without a final careful review, because part of me thinks that I can still back out, since the interactive part isn't finished yet.

But maybe not. I haven't tried it yet (and it does seem really useful).

s.gif
It will depend on the commands in question. The entire unix pipeline is instantiated in parallel, so the commands following vipe will already be running and waiting on stdin.

You could kill them before exiting the editor, if that's what you want. Or you could do something else.

The other commands in the pipeline are run by the parent shell, not vipe, so handling this would not be vipe specific.

s.gif
And there's always the power button on your computer :-)
s.gif
Kind of ugly, but yeah, that's what I'd imagine doing.
s.gif
This is useful - I usually either use an intermediate file or a bunch of grep -v
s.gif
the editor "micro" can be used as a pipe editor. it's a feature. And without using temp file tricks as vipe does. echo 'test' | micro | cat
My favorite is `ts`! It adds a timestamp at the beginning of each input line.

I often use it with tee to save the log output of any command.

  $ ping google.com | ts '[%Y%m%d-%H:%M:%.S]' | tee /tmp/ping.log
  [20220416-21:57:20.837983] PING google.com (172.217.175.78): 56 data bytes
  [20220416-21:57:20.838391] 64 bytes from 172.217.175.78: icmp_seq=0 ttl=53 time=6.028 ms
  [20220416-21:57:21.817189] 64 bytes from 172.217.175.78: icmp_seq=1 ttl=53 time=9.621 ms
  [20220416-21:57:22.818339] 64 bytes from 172.217.175.78: icmp_seq=2 ttl=53 time=9.443 ms
  [20220416-21:57:23.823126] 64 bytes from 172.217.175.78: icmp_seq=3 ttl=53 time=8.921 ms
> errno: look up errno names and descriptions

I like this. Reminds me of a couple very useful things I've done:

1. Add a -man switch to command line programs. This causes a browser to be opened on the web page for the program. For example:

    dmd -man
opens https://dlang.org/dmd-windows.html in your default browser.

2. Fix my text editor to recognize URLs, and when clicking on the URL, open a browser on it. This silly little thing is amazingly useful. I used to keep bookmarks in an html file which I would bring up in a browser and then click on the bookmarks. It's so much easier to just put them in a plain text file as plain text. I also use it for source code, for example the header for code files starts with:

    /*
     * Takes a token stream from the lexer, and parses it into an abstract syntax tree.
     *
     * Specification: $(LINK2 https://dlang.org/spec/grammar.html, D Grammar)
     *
     * Copyright:   Copyright (C) 1999-2020 by The D Language Foundation, All Rights Reserved
     * Authors:     $(LINK2 http://www.digitalmars.com, Walter Bright)
     * License:     $(LINK2 http://www.boost.org/LICENSE_1_0.txt, Boost License 1.0)
     * Source:      $(LINK2 https://github.com/dlang/dmd/blob/master/src/dmd/parse.d, _parse.d)
     * Documentation:  https://dlang.org/phobos/dmd_parse.html
     * Coverage:    https://codecov.io/gh/dlang/dmd/src/master/src/dmd/parse.d
     */
and I'll also use URLs in the source code to reference the spec on what the code is implementing, and to refer to closed bug reports that the code fixes.

Very, very handy!

s.gif
P.S. I mentioned having links in the source code to the part of the spec. The only problem with this is when the spec (i.e. the C11 Standard) is not in html form. I can only add the paragraph number in the code. What an annoying waste of time every time I want to check that the implementation is exactly right.

For contrast, there's this site:

https://www.felixcloutier.com/x86/index.html

which is a godsend to me. Now, in the dmd code generator, I put in links to the detail page for an instruction when the code generator is generating that instruction. Oh, how marvelous that is! And there is joy in Mudville.

s.gif
Intel actually lags behind the industry in that they don't really have a formal specification. Arm have a machine readable specification that can be verified by a computer whereas Intel have this weird pseudocode.

Also uops.info is a good reference for how fast the instructions are

I use a few of these regularly:

ts timestamps each line of the input, which I've found convenient for ad-hoc first-pass profiling in combination with verbose print statements in code: the timestamps make it easy to see where long delays occur.

errno is a great reference tool, to look up error numbers by name or error names by number. I use this for two purposes. First, for debugging again, when you get an errno numerically and want to know which one it was. And second, to search for the right errno code to return, in the list shown by errno -l.

And finally, vipe is convenient for quick one-off pipelines, where you know you need to tweak the input at one point in the pipeline but you know the nature of the tweak would take less time to do in an editor than to write the appropriate tool invocation to do it.

s.gif
Its a shame errno is even needed, and is one of my pet peeves about linux.

On linux, errno.h is fragmented across several places because errnos are different on different architectures. I think this started because when Linus did the initial alpha port, he bootstrapped Linux using a DEC OSF/1 userland, which meant that he had to use the DEC OSF/1 BSD-derived values of errno rather than the native linux ones so that they would run properly. I'm not sure why this wasn't cleaned up before it made it into the linux API on alpha.

At least on FreeBSD, determining what errno means what is just a grep in /usr/include/sys/errno.h. And it uses different errnos for different ABIs (eg, linux binaries get their normal errnos, not the FreeBSD ones).

s.gif
I’ve used this for years:
  errno () {
        grep \\\<$*\\\> /usr/include/sys/errno.h
  }
There will be a false positive now & then but this is good enough.
s.gif
It does, but its easier to grep for the number in a errno.h
s.gif
I actually installed moreutils just for errno, but I got disappointed. Here it the whole `errno -l` : http://ix.io/3VeV

Like, none of the everyday tools I use produce exit codes which correspond to these explanations.

For my scripts, I just return 1 for generic erros and 2 for bad usage. I wished I could be more specific in that and adhered to some standard.

s.gif
errno codes aren't used in the exit codes of command-line tools; they're used in programmatic APIs like syscalls and libc functions.

There's no standard for process exit codes, other than "zero for success, non-zero for error".

s.gif
That is not entirely true. There is sysexits.h, which tried to standardize some exit codes (originally meant for mail filters). It is used by, for instance, argp_parse().

https://sourceware.org/git/?p=glibc.git;a=blob;f=misc/sysexi...

s.gif
Some programs do follow a specification (sysexits.h) where numbers 64 and higher are used for common uses, such as usage error, software error, file format error, etc.
s.gif
Aldo when the oomkiller process kills another process it causes it to exit with a well known exit code: 137
s.gif
That’s not an exit code; that is a fake exit code which your shell made up, since your shell has no other way to tell you that a process did not exit at all, but was killed by a signal (9, i.e. SIGKILL). This is, again, not an actual exit code, since the process did not actually exit.

See here for why you can’t use an exit status of 128+signal to fake a “killed by signal” state, either:

https://www.cons.org/cracauer/sigint.html

Today the tagline is that moreutils is a "collection of the unix tools that nobody thought to write long ago when unix was young", but the original[1] tagline was that it was a "collection of the unix tools that nobody thought to write thirty years ago".

Well, Joey started moreutils in 2006, so it's more than half way to that original "30 years ago" threshold!

[1]: http://source.joeyh.branchable.com/?p=source.git;a=blob;f=mo...

Using `vipe` you can do things like:
   $ pbpaste | vipe | pbcopy
Which will open your editor so you can edit whatever is in your clipboard.
s.gif
Just realised what the pb stands for. Did they really not think of clipboard? Who came up with the "clipboard" name?
s.gif
The pasteboard, like many things in OS X, is from NextStep. As for why they called it a pasteboard and not a clipboard, I have no idea, presumably someone thought it would be more descriptive.
s.gif
Wikipedia says "clipboard" was coined by Larry Tesler (in the 70's?)
I was prepared to mock this before I even clicked, but I have to say this looks like a nice set of tools that follow the ancient Unix philosophy of "do one thing, play nice in a pipeline, stfu if you have nothing useful to say". Bookmarking this to peer at until I internalize the apps. There's even an Ubuntu package for them.

I don't think it's a good idea to rely on any of these being present. If you write a shell script to share and expect them to be there, you aren't being friendly to others, but for interactive command line use, I'm happy to adopt new tools.

s.gif
> Bookmarking this to peer at until I internalize the apps. There's even an Ubuntu package for them.

Ditto. But I will probably forget they exist and go do the same old silly kludges with subshells and redirections. May I ask if anyone has a technique for introducing new CL tools into their everyday workflow?

It helps if things have good man, apropos and help responses, but the problem is not how new tools function, rather remembering that they exist at all.

Sometimes I think I want a kind of terminal "clippy" that says:

"Looks like you're trying to match using regular expressions, would you like me to fzf that for you?"

Then again, maybe not.

s.gif
My two cents on this is that if you do something enough that one of these tools is a good tool for it, it’ll quickly become a habit to use the tool. And if there’s a rare case where one of these tools would have been useful but you forgot it existed, you’re probably not wasting too much time using a hackier solution.

That being said, I’ve been meaning to add a Linux and Mac install.sh script to my dotfiles repo for installing all my CLI tools, and that could probably serve as a good reminder of all the tools you’ve come across over the years that might provide some value.

s.gif
> May I ask if anyone has a technique for introducing new CL tools into their everyday workflow?

Pick one tool a year. Write it on a sticky note on your monitor. Every time you're doing something on the command line, ask yourself "would $TOOL be useful here?".

You're not going to have high throughput on learning new tools this way, but it'll be pretty effective for the tools you do learn.

s.gif
When you learn about a tool, actually think about a specific situation/use case where you would actually use it. Try it out as soon as you learn about it. I find I can only do one tool at a time. You remember it better if it’s something you would use frequently at the moment. If I get a big old list of useful stuff, I’d be lucky to actually incorporate more than 2 of them. At a time anyway. If you don’t remember the tool, it’s probably because you wouldn’t use it enough to retain knowledge of it..
s.gif
I like this one. Thanks. Mentally binding a new tool name specifically to one regular task sounds like a good entry point for a repeated learning framework.
s.gif
Flashcard software like Mnemosyne. Seriously. I use it to keep reminding me of keyboard shortcuts until I've ingrained them in my workflow.
s.gif
>I don't think it's a good idea to rely on any of these being present. If you write a shell script to share and expect them to be there, you aren't being friendly to others, but for interactive command line use, I'm happy to adopt new tools.

Isn't that a shame though? Where does it say in the UNIX philosophy that the canon should be closed?

s.gif
It's not that different than refraining from using non-POSIX syntax in shell scripts that are meant to be independent of a specific flavour of unix, or sticking with standard C rather than making assumptions that are only valid in one compiler.

There are shades of grey, of course. Bash is probably ubiquitous enough that it may not be a big issue if a script that's meant to be universal depends on it, as long as the script explicitly specifies bash in the shebang. Sometimes some particular functionality is not technically part of a standard but is widely enough supported in practice. Sometimes the standards (either formal or de facto) are expanded to include new functionality, and that's of course totally fine, but it's not likely to be a very quick process because there are almost certainly going to be differing opinions on what should be part of the core and what shouldn't.

Either way, sometimes you want to write for the lowest common denominator, and moreutils certainly aren't common enough that they could be considered part of that.

s.gif
I think it's fine if you're on a dev team that decides to include these tools in its shared toolkit, but none of these rise to the level that I think warrants them being a dependency for a broadly-distributed shell script. There are slightly-less-terse alternatives for most of the functionality that only rely on core utilities. I don't think it's being a good citizen to say "go install moreutils and its dozen components because I wanted to use sponge instead of >output.txt".
Every time I stumble across moreutils, I can’t understand what any of its tool do. Take all these for instance:

pee: tee standard input to pipes

sponge: soak up standard input and write to a file

ts: timestamp standard input

vidir: edit a directory in your text editor

vipe: insert a text editor into a pipe

zrun: automatically uncompress arguments to command

Wth do they do? What does any of this means? I know tee but have no idea what « tee stdin to pipes » would do

s.gif
`ts` is like `cat`, but each line gets prefixed with a timestamp of when that line was written. Useful when writing to log files, or finding which step of a program is taking the most time.

`sponge` allows a command to overwrite a file in-place; e.g. if we want to replace the contents of myfile.txt, to only keep lines containing 'hello', we might try this:

    grep 'hello' < myfile.txt > myfile.txt
However, this won't work: as soon as grep finds the first matching line, the entire contents of the file will be overwritten, preventing any more from being found. The `sponge` command waits until its stdin gets closed, before writing any output:
    grep 'hello' < myfile.txt | sponge > myfile.txt
s.gif
You will want to read the man pages. 'pee' is straightforward to understand if you already understand 'tee'. 'pee' is used to pipe output from one command to two downstream commands.

https://en.wikipedia.org/wiki/Tee_(command)

s.gif
I know tee but have no idea what pee would do considering this description.

From your description I guess something like « pee a b » would be the same as « a | b »? If so that’s cool, but the one line descriptions definitely need a rework.

s.gif
It loosely follows the same metaphor as the T-shaped pipe connector that 'tee' is named for.
    $ seq 3 | tee somefile
    1
    2
    3

            ╔═somefile
    seq═tee═╣    
            ╚═stdout


    $ seq 10 | pee 'head -n 3' 'tail -n 3'
    1
    2
    3
    8 
    9
    10


            ╔═head═╗
    seq═pee═╣      ╠═stdout
            ╚═tail═╝
s.gif
I believe it's more like `a | pee b x y z | c`. This would run `b x y x` as well as `c` using the output from `a`.

I think Zsh supports this natively with its "multios" option, `a > >(b c y z) > >(c)`. But then you have to write the rest of your pipe inside the ().

s.gif
I can only guess from that description is that it is the cat command.
s.gif
The unix shell was designed to optimize for simple pipes. to make pipe networks you usually have to use named pipes, aka fifos.

I once had to use an IBM mainframe shell(cms If I remember correctly) which had pipes however IBM in their infinite wisdom decided to make the pipe network the primary interface, while this made complex pipes a bit less awkward, simple(single output) pipes were wordy compared to the unix equivalent.

The "What's included" section direly needs either clearer/longer descriptions, or at least links to the tools' own pages (if they have them) where their use case and usage is explained. I've understood a lot more about (some of) the tools from the comments here than from the page - and I'd likely have skipped over these very useful tools if not for these comments!
s.gif
Ok, longer descriptions from the tools' man pages:

chronic runs a command, and arranges for its standard out and standard error to only be displayed if the command fails (exits nonzero or crashes). If the command succeeds, any extraneous output will be hidden.

A common use for chronic is for running a cron job. Rather than trying to keep the command quiet, and having to deal with mails containing accidental output when it succeeds, and not verbose enough output when it fails, you can just run it verbosely always, and use chronic to hide the successful output.

combine combines the lines in two files. Depending on the boolean operation specified, the contents will be combined in different ways:

and Outputs lines that are in file1 if they are also present in file2.

not Outputs lines that are in file1 but not in file2.

or Outputs lines that are in file1 or file2.

xor Outputs lines that are in either file1 or file2, but not in both files.

The input files need not be sorted

ifdata can be used to check for the existence of a network interface, or to get information about the interface, such as its IP address. Unlike ifconfig or ip, ifdata has simple to parse output that is designed to be easily used by a shell script.

lckdo: Now that util-linux contains a similar command named flock, lckdo is deprecated, and will be removed from some future version of moreutils.

mispipe: mispipe pipes two commands together like the shell does, but unlike piping in the shell, which returns the exit status of the last command; when using mispipe, the exit status of the first command is returned.

Note that some shells, notably bash, do offer a pipefail option, however, that option does not behave the same since it makes a failure of any command in the pipeline be returned, not just the exit status of the first.

pee: [my own description: `pee cmd1 cmd2 cmd3` takes the data from the standard input, sends copies of it to the commands cmd1, cmd2, and cmd3 (as their stdin), aggregates their outputs and provides that at the standard output.]

sponge, ts and vipe have been described in other comments in this thread. (And I've also skipped some easier-to-understand ones like errno and isutf8 for the sake of length.)

zrun: Prefixing a shell command with "zrun" causes any compressed files that are arguments of the command to be transparently uncompressed to temp files (not pipes) and the uncompressed files fed to the command.

The following compression types are supported: gz bz2 Z xz lzma lzo

[One super cool thing the man page mentions is that if you create a link named z<programname> eg. zsed, with zrun as the link target, then when you run `zsed XYZ`, zrun will read its own program name, and execute 'zrun sed XYZ' automatically.]

s.gif
> [One super cool thing the man page mentions is that if you create a link named z<programname> eg. zsed, with zrun as the link target, then when you run `zsed XYZ`, zrun will read its own program name, and execute 'zrun sed XYZ' automatically.]

That's a brilliant way to improve the ergonomics.

s.gif
> [One super cool thing the man page mentions is that if you create a link named z<programname> eg. zsed, with zrun as the link target, then when you run `zsed XYZ`, zrun will read its own program name, and execute 'zrun sed XYZ' automatically.]

Or just use an alias

s.gif
They come with good manpages if you install them. I don't disagree that a little more detail on the linked web page might help people decide whether or not to install, but in keeping with how these are well-behaved oldschool-style Unix commands, they also come with man pages, which is more than I can say for most command line tools people make today (the ones that assume you have Unicode glyphs and color terminals [and don't check if they're being piped or offer --no-color] and don't accept -h/--help and countless other things that kids these days have eschewed).
These are necessarily bash functions, not executables, but here are two tools I'm proud of, which seem similar in spirit to vidir & vipe:
    # Launch $EDITOR to let you edit your env vars.

    function viset() {
      if [ -z "$1" ]; then
        echo "USAGE: viset THE_ENV_VAR"
        exit 1
      else
        declare -n ref=$1
        f=$(mktemp)
        echo ${!1} > $f
        $EDITOR $f
        ref=`cat $f`
        export $1
      fi
    }

    # Like viset, but breaks up the var on : first,
    # then puts it back together after you're done editing.
    # Defaults to editing PATH.
    #
    # TODO: Accept a -d/--delimiter option to use something besides :.
    
    function vipath() {
      varname="${1:-PATH}"
   
      declare -n ref=$varname
      f=$(mktemp)
      echo ${!varname} | tr : "\n" > $f
      $EDITOR $f
      ref=`tr "\n" : < $f`
      export $varname
    }
Mostly I use vipath because I'm too lazy to figure out why tmux makes rvm so angry. . . .

I guess a cool addition to viset would be to accept more than one envvar, and show them on multiple lines. Maybe even let you edit your entire env if you give it zero args. Having autocomplete-on-tab for viset would be cool too. Maybe even let it interpret globs so you can say `viset AWS*`.

Btw I notice I'm not checking for an empty $EDITOR. That seems like it could be a problem somewhere.

s.gif
Anywait is the tool I've always wanted, and implemented in pretty much the same way I would do so. However, besides waiting on a pid, I occasionally wait on `pidof` instead, to wait on multiple instances of the same process, running under different shells (wait until all build jobs are done, not just the current job).

ched also looks quite useful; automatically cleaning the old data after some time is great, as I commonly leave it lying around.

age also looks great for processing recent incoming files in a large directory (my downloads, for example)

p looks great to me, I rarely need the more advanced features of parallel, and will happily trade them for color coded outputs.

I looked to see what nup is because I don't understand the description...only to find out it doesn't actually exist. I'm assuming it's intended to send a signal to a process? But if so, why not just use `kill -s sigstop`?

pad also doesn't exist, but seems like printf or column could replace it, as these are what I usually use. I think there's also a way to pad variables in bash/zsh/etc.

whl is literally just `while do_stuff; do; done` and repeat is just `while true; do do_stuff; done`. It never even occurred to me to look for a tool to do untl; I usually just use something along the lines of `while ! do_stuff; do c=$((c + 1)); echo $c; done`. While the interval and return codes make it almost worthwhile, they themselves are still very little complexity; parsing the parameters adds more complexity than their implementation does.

spongif seems useful, but is really just a variation of the command above.

Most of them are nice to have, but they still ship an incompatible suboptimal parallel, which you explicitly have to check against in your configure, if you expect GNU parallel.

Name it at least c-parallel

s.gif
Oh so that’s it! I use GNU parallel a lot, and installed moreutils yesterday, and parallel seemed to behave a bit… different. Couldn’t quite understand why, as I didn’t expect moreutils to replace parallel when you install them.

I’m using Arch, btw.

The need for ifdata(1) has become even more acute with the essential replacement of ifconfig(1) by ip(1), an even more inscrutable memory challange. However, it would be even nicer if its default action when not given an interface name was do <whatever> for all discovered interfaces.
s.gif
Do not “ip -brief address show” and “ip -brief link show” serve as suitable replacements for most common uses of ifdata(1)? The ip(8) command even supports JSON output using “-json” instead of “-brief”.
s.gif
Sure, they do.

In fact, I take it back. ifdata(1) is not in any way a replacement for ifconfig(1) for most things. The problem is that just running ifconfig with no arguments showed you everything, which was generally perfect for interactive use. Now to get any information from ip(1) you have to remember an argument name. If you do this a lot, it's almost certainly fine. If you do it occasionally, it's horrible.

s.gif
Those are pretty useful!

Although I do wish MacOS had a fully-compatible ip command instead of ifconfig et al.

s.gif
TIL. Thanks.

One does have to wonder though, why isn't -brief the default and the current default set to -verbose or -long. I look at -brief on either command and it has all the information I am ever looking for.

s.gif
Yes, I suppose, but I can never remember those flags.
s.gif
Amazing how many new things come about because people don't know or don't remember how to use the existing things.
s.gif
Well, they're strictly worse for interactive use than ifconfig, from what I can tell from your comment.
s.gif
We were discussing ifdata, not ifconfig. From the documentation, ifdata is explicitly meant for use in scripts. And in scripts, using “ip -brief” or “ip -json … | jq …” may well be suitable replacements for ifdata.
s.gif
I find "ip a" (which sounds like IPA) to be a good enough replacement to ifconfig without argument to get an overview of the network state.
s.gif
If you just want the IP on interfaces, "ip r" is sufficient and easy to remember.
The absolute number one Unix tool that should have - and, more to the point, COULD have - been written long ago is mlr.

https://github.com/johnkerl/miller

"Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON."

This should have been part of the standard unix toolkit for the last 40 years.

So just today I was wondering if there was a cli tool (or maybe a clever use of existing tools...) that could watch the output of one command for a certain string, parse bits of that out, and then execute another command with that parsed bit as input. For example, I have a command I run that spits out a log line with a url on it, I need to usually manually copy out that url and then paste it as an arg to my other command. There are other times when I simply want to wait for something to start up (you'll usually get a line like "Dev server started on port 8080") and then execute another command.

I know that I could obviously grep the output of the first command, and then use sed or awk to manipulate the line I want to get just the url, but I'm not sure about the best way to go about the rest. In addition, I usually want to see all the output of the first command (in this case, it's not done executing, it continues to run after printing out the url), so maybe there's a way to do that with tee? But I usually ALSO don't want to intermix 2 commands in the same shell, i.e. I don't want to just have a big series of pipes, Ideally I could run the 2 commands separately in their own terminals but the 2nd command that needs the url would effectively block until it received the url output from the first command. I have a feeling maybe you could do this with named pipes or something but that's pretty far out of my league...would love to hear if this is something other folks have done or have a need for.

s.gif
In one terminal, run:
  $ mkfifo myfifo
  $ while true; do sed -rune 's/^Dev server started on port (.*)/\1/p' myfifo | xargs -n1 -I{} echo "Execute other command here with argument {}"; done

In the other terminal, run your server and tee the output to the fifo you just created:
  $ start_server | tee myfifo
s.gif
A named pipe sounds like a good way to fulfill your requirement of having the command runs on separate shells.. In the first terminal, shove the output of commend A into the named pipe. In the second terminal, have a loop that reads from the named pipe line by line and invokes command B with the appropriate arguments.

You can create a named pipe using "mkfifo", which creates a pipe "file" with the specified name. Then, you can tell your programs to read and write to the pipe the same way you'd tell them to read and write from a normal file. You can use "<" and ">" to redirect stdout/stderr, or you can pass the file name if it's a program that expects a file name.

s.gif
And for completeness, miniexpect which is a small C library implementing the same sort of thing: https://rwmj.wordpress.com/2014/04/25/miniexpect-a-small-exp...
  $ wc -l miniexpect.[ch]
  489 miniexpect.c
  110 miniexpect.h
s.gif
I’d solve your exact problem like this:

1. Run one command with output to a file, possibly in the background. Since you want to watch the output, run “tail --follow=name filename.log”.

2. In a second terminal, run a second tail --follow on the same log file but pipe the output to a command sequence to find and extract the URL, and then pipe that into a shell while loop; something like “while read -r url; do do-thing-with "$url"; done”.

s.gif
I used Python subprocess module for this.

… good luck, is my best advice. It’s not straightforward to handle edge cases.

I wrote a utility like ts before, called it teetime, was thrilled with my pun. It was quiteand useful when piping stdout from a compute heavy tool (multi hour EDA tool run) as you could see by the delta time between logs what the most time consuming parts were.
I have searched ts for a long time. Used it many years ago but forgot the exact name of the tool and package. No search machine could find it or I just entered to wrong search words.
A lot of these things -- and a lot of shell tools in general -- strike me as half-baked attempts to build monads for the Unix command line. No disrespect intended; nobody understood monads when Unix was invented. But it makes me wonder what a compositional pipe-ish set of command line tools would look like if it were architected with modern monad theory in mind.
I only install moreutils for vidir which is simply brilliant IMO
s.gif
I prefer Emacs dired, where pressing C-x C-q starts editing the opened directory (including any inserted subdirectories).
s.gif
One alias I always do is "vidir" -> "vidir --verbose" so that it would tell me what it's doing
Working with *nix for over 25 years and I've only just heard of sponge.
I have used vidir from this collection quite a bit. If you're a vi person, it's makes it quite convenient to use vi/vim for renaming whole directories full of files.
s.gif
I prefer vifm for this, but I’ll give visit a try as well, now that I know about it.
s.gif
Ha! I use vifm now too, since this functionality is pretty much built in, but I still use vidir for one-offs and I thought it might appeal to some of the true minimalists.
moreutils indeed has some great utils, but a minor annoyance it causes is still shipping a `parallel` tool which is relatively useless, but causes confusion for new users or conflict (for package managers) with the way way way more indispensable GNU parallel.
s.gif
I was going to comment on this. I still think Joey is in the wrong here.

The Debian BTS thread:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=749355

(I'm still trying to sort out what the resolution was, though it appears divert was used.)

s.gif
When I installed moreutils v0.67 with macports just now it said:
  moreutils has the following notes:
    The binary parallel is no longer in this port; please install the port parallel instead.
i.e. GNU parallel
s.gif
Yup, homebrew and other package managers do similar it's what I meant by

> conflict (for package managers)

s.gif
On the bright side, the moreutils version doesn't have a nag screen.
s.gif
`parallel` seems redundant because it appears that `xargs -P` can accomplish the same effect, except the "-l maxload" option.
s.gif
If you mean GNU parallel, it has way way more features than xargs -P, (and a few saner defaults).

See e.g. https://www.gnu.org/software/parallel/parallel_tutorial.html

If you mean moreutils parallel, yeah I agree it's not useful.

s.gif
But that’s the point of parallel. The use case is for when you have N processors and a 10Gb NIC. Each job is CPU bound or concurrent license bound, or some jobs may take longer than others. Parallel allows you to run X jobs simultaneously to keep the CPU or licenses busy.
This comment is in praise of this idea I promise.

I think I could pick apart about half of these and show how they aren't needed,

and the showpiece example is particularly weak since sed can do that do that entire pipeline itself in one shot, especially any version with -i.

You don't need either grep or sponge. Maybe sponge is still useful over simple shell redirection, but this example doesn't show it.

One of the other comments here suggests that the real point of sponge vs '>' is that it doesn't clobber the outoutput until the input is all read.

In that case maybe the problem is just that the description doesn't say anything about that. But even then there is still a problem, in that it should also stress that you must not unthinkingly do "sponge > file" because the > is done by the shell and not controlled by executable, and the shell may zero out the file immediately on parsing the line before any of the commands get to read it.

This makes sponge prone to unpleasant surprise because it leads the user to think it prevents something it actually has no power to prevent. The user still has to operate their shell correctly to get the result they want, just like without sponge.

So it's a footgun generator.

Maybe it's still actually useful enough to be worth writing and existing, but just needs some better example to show what the point is.

To me though, from what is shown, it just looks like an even worse example of the "useless use of cat" award, where you not only use cat for no reason, you also write a new cat for no reason and then use it for no reason.

But there is still something here. Some of these sound either good or at least near some track to being good.

I like it.

s.gif
Just a small comment about sponge: looking at the example, it's doing "sponge file", not "sponge > file". Given that, it's totally up to sponge to decide when it's going to open the output file.
Some of this looks useful, but a non exhaustive critique - from non expert - of some of the rest:

> chronic: runs a command quietly unless it fails

Isn't that just `command >/dev/null`?

> ifdata: get network interface info without parsing ifconfig output

`ip link show <if>`?

> isutf8: check if a file or standard input is utf-8

`file` for files. For stdin, when is it not utf8 - unless you've got some weird system configuration?

> lckdo: execute a program with a lock held

`flock`?

s.gif
> > chronic: runs a command quietly unless it fails > Isn't that just `command >/dev/null`?

Often times you want to run a command silently (like in a build script), but if it fails with a nonzero exit status, you want to then display not only the stderr but the stdout as well. I’ve written makefile hacks in the past that do this to silence overly-chatty compilers where we don’t really care what it’s outputting unless it fails, in which case we want all the output. It would’ve been nice to have this tool at the time to avoid reinventing it.

s.gif
> > chronic: runs a command quietly unless it fails

> Isn't that just `command >/dev/null`?

no. that just shows your stderr, not stdout if it failed. and you get stderr even if it doesn't fail.

s.gif
so use '2>&1' to redirect stderr to stdout.

new ideas are great. but this isn't a new idea. "run a command silently unless it fails" is basic; it's the sort of thing that should make one think "i should search the shell man pages" rather than "i should roll my own new utility."

s.gif
Read his comment again. '2>&1' just redirects stderr.
s.gif
I guess that chronic doesn't just show stderr, but also stdout if the command fails. If I am not mistaken, your example would hide the stdout output, even when the command fails.
It's been a very long time since this happened, but in my early days of using Linux, I experienced naming collisions with both sponge and parallel, and at the time I didn't know how to resolve them. I don't remember which other sponge there was, but I imagine most Linux users are familiar with GNU parallel at this point.
How might one use sponge in a way that shell redirection wouldn’t be more fully-featured? The best I can currently think of is that it’s less cumbersome to wrap (for things like sudo.)
s.gif
Sponge exists for cases where shell redirection wouldn't work, namely where you want the source and sink to be the same file. If you write:
    somecmd < somefile | othercmd | anothercmd > somefile
the output redirection will truncate the file before it can get read as input.

Sponge "soaks up" all the output before writing any of it, so that you can write pipelines like that:

    somecmd < somefile | othercmd | anothercmd | sponge somefile
s.gif
This can be done with regular shell redirection, even though I wouldn't recommend it. Easy to get wrong, and fairly opaque:
    $ cat foo
    foo
    bar
    baz
    $ ( rm foo && grep ba > foo ) < foo
    $ cat foo
    bar
    baz
    $
s.gif
Thank you, I missed this bit of nuance. That indeed would be useful, and now the example makes a lot more sense.
I was introduced to vidir through ranger's :bulkrename feature. Extremely handy. I don't think I've used the other stuff, but from reading the thread, vipe sounds great.
My favorite missing tool of all time is `ack` [0]. It's grep if grep were made now. I use it all the time, and it's the first thing I install on a new system.

It has a basic understanding of common text file structures and also directories, which makes it super powerful.

[0] https://beyondgrep.com

s.gif
Had never heard of it before, but I'll definitely check it out now. Thanks!
s.gif
ripgrep is a modern written-in-Rust[0] equivalent that I really like.

0: I like this because it's much easier to edit IMHO

>pee: tee standard input to pipes

Nice tool, great name.

s.gif
pee is not really needed these days as with bash you can:
  tee -p >(command | line)
Note the -p option available now with GNU tee that doesn't exit tee if the pipe closes early
heh. I use `chronic` all the time, `ifdata` in some scripts that predate Linux switching to `ip`. I occasionally use `sponge` for things but it's almost always an alternative to doing something correctly :-)

Looking at the other comments, I suspect one of the difficulties in finding a new maintainer will be that lots of people use 2 or 3 commands from it, but nobody uses the same 2 or 3, and actually caring about all of them is a big stretch...

Is this `parallel` a new tool? The well known "GNU Parallel" can cover all your parallelizing needs I would bet.
I don't know how to specify it, but it would surely be popular:
     oops
s.gif
I outright laughed that its juxtaposed with sponge.
here's a little wrapper around i made around "find" which i always have to install on every new box i manage ....

https://github.com/figital/fstring

(just shows you more useful details about what is found)

s.gif
Coincidentally I have a very similar (almost identical) function in my shell profile.
I have this installed and have used some of these sometimes; it is good. (I do not use all of them, though)
Have loved this collection for a long time. I use errno almost daily.
s.gif
tee(1) duplexes its input, streaming to both the specified file and stdout.

sponge(1) soaks up input and writes it to the specified file at end of data.

tee doesn't sponge. sponge doesn't tee.

Are they written in rust? If not, it's just more programmes someone is going you have to rewrite at some point sigh.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK