File Replacement on UNIX
source link: https://www.pixelbeat.org/docs/unix_file_replacement.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
File Replacement on UNIX
Replacing the contents of a file on the UNIX command line using the standard commands is surprisingly tricky. In the examples below, $filter is any command that reads from stdin and writes to stdout.
-
$filter < file > file
- Newbie fail. Immediately lose all your data as the shell truncates "file" before the filter reads the data.
-
cat file | $filter > file
- Novice fail. `cat` may be run quickly enough to read the data before the shell is scheduled to truncate the file. But depending on the kernel, the size of the file and the load on the system, you will lose data.
-
cp file file.tmp $filter < file.tmp > file rm file.tmp
- The current directory might not be writeable
- Lose your data if $filter fails or is interrupted
- Data inconsistent for a while
- Slower as file written twice
-
{ rm file && $filter > file; } < file
- The current directory might not be writeable
- Lose your data if $filter fails or is interrupted
- Data inconsistent for a while
- Lose all attributes of original file
-
If we don't care about atomicity
then we can use another writeable dir.
$filter < file > /tmp/file.tmp && mv /tmp/file.tmp file
- Lose all attributes of original file
- Slower as file written twice
- mv still needs writeable dir as it recreates the file
- File missing for a while
- Data inconsistent for a while
- Limited to attributes supported by the other file system
-
We can get around most of the previous issues by
using cp and rm rather than mv as cp
does a truncate(); write(); on the
original file, so all attributes are maintained.
Note this method is functionally equivalent to that used by
the sponge utility.
$filter < file > /tmp/file.tmp && cp /tmp/file.tmp file rm /tmp/file.tmp
- Slower as file written twice
- Data inconsistent for a while
-
If we do need atomicity then we need
to have a dir on the same filesystem that's writeable,
so that we can do a rename(old,new) which is atomic.
Usually that's the current dir so assuming that...
$filter < file > file.tmp && mv file.tmp file
- Lose all attributes of original file
-
If we want to maintain all attributes
which is increasingly important with selinux and
capabilities etc. we'd have to:
cp -a file file.tmp $filter < file > file.tmp && mv file.tmp file
- Slower as file written twice
-
If we want to be more efficient, then
we would need cp to support only copying the attributes.
I've proposed a patch
to do just that:
cp --attributes-only file file.tmp $filter < file > file.tmp && mv file.tmp file
- Note certain attributes are only maintainable by root. For e.g. non root users updating another user's file by first creating a new temporary file, will silently change ownership of the original file when it's replaced.
-
What if you want a backup though?
cp --attributes-only file file.tmp $filter < file > file.tmp && mv -b file.tmp file
- This is no longer atomic as the file is not present for a short while as mv implements the backup like: rename(old,bak); rename(tmp,old);
-
So therefore if you want to support atomic replacement
with backups with cp/mv you need to
cp --attributes-only file file.tmp cp -a -b -f file file $filter < file > file.tmp && mv file.tmp file
- Slower as file written twice
-
We can make the extra backup step in the previous
step more efficient by using hardlinks
(thanks reddit!).
cp --attributes-only file file.tmp cp -l -b -f file file $filter < file > file.tmp && mv file.tmp file
So if one was to implement a general replacement script (as I'm currently considering for GNU coreutils), one could apply the following logic:
if --atomic && --backup; then 12 elif --atomic; then 9 || 8 else 9 || 8 || 6 fi
Note there are many other edge cases to consider which are mainly handled within cp/mv, which one can see by looking at the complexity of copy.c.
There is also the general caveat of how to deal with interruptions at various parts of the above. I.E. if a script implementing the above is killed, are there tmp files left on disk?
There is also the more general caveat of the variances in consistency guarantees provided by various file systems.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK