52

rsync同步和备份文件到本地

 5 years ago
source link: https://www.tlanyan.me/rsync-backup-files-to-localhost/?amp%3Butm_medium=referral
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

转载请注明文章出处: https://tlanyan.me/use-rsync-backup-files/

rsync 是主机间同步和备份的神器。相对于 ftpscp 等工具, rsync 功能更强大,同步/传输效率更高,实属服务器的必备工具。

最近使用 rsync 时发现一个问题:PC和移动硬盘之间用 rsync 同步,修改过的二进制大文件会整个文件重传,效率十分低下。说好的 rsync 只传输差异部分呢?还是二进制文件的问题?但 rsync 的man手册明明这样写的:

Rsync is a fast and extraordinarily versatile file copying tool. It can copy locally, to/from another host over any remote shell, or to/from a remote rsync daemon. It offers a large number of options that control every aspect of its behavior and permit very flexible specification of the set of files to be copied. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use.

带着这个疑问上网查询,找到一个和我有同样困惑的人: Smarter filetransfers than rsync?

幸运的是有人完美的回答了这个问题:

Rsync will not use deltas but will transmit the full file in its entirety if it – as a single process – is responsible for the source and destination files. It can transmit deltas when there is a separate client and server process running on the source and destination machines.
The reason that rsync will not send deltas when it is the only process is that in order to determine whether it needs to send a delta it needs to read the source and destination files. By the time it’s done that it might as well have just copied the file directly.

翻译过来是:主机间通过网络同步文件,每个主机各运行一个 rsync 进程分别本机内的文件hash,然后通过网络传输差异部分;主机内的同步只有一个进程, rsync 认为与其先对比文件再复制差异部分,不如直接进行复制来得快,故而选择传送整个文件。

仔细想一下, rsync 的行为是合理的:主机间通讯的瓶颈在网络带宽,先计算差异部分再传效率高;同主机内是硬盘对拷,速度是网络速度的十来倍,直接拷贝比一般比先对比再传输更快,直接复制整个文件是很好的选择。

写了个脚本测试 rsync 的行为:

#!/bin/bash
echo "make test file"
dd if=/dev/zero of=testfile bs=1024k count=512
echo "cp test file"
cp testfile syncfile
echo "make changes to test file"
echo '1234567890' >> testfile
echo "rsync file in local..."
rsync -avh -P testfile syncfile

echo ""
echo "restore sync file"
dd if=/dev/zero of=syncfile bs=1024k count=512
echo "rsync file via network"
rsync -avh -P testfile localhost:~/syncfile

测试脚本输出结果如下:

2EfUzu3.jpg!web

结果和预期的一致:本机内同步时,直接全量复制;走SSH协议后,仅发送差异部分,显著提高效率。

rsync 的做法没毛病,但仅做过小部分修改的大文件,同主机内全量拷贝也很伤人。解决办法是用测试脚本内的模拟网络传输。Linux系统的主机基本都内置SSHD,写命令时加上 localhost 和代表网络的冒号即可;Windows 10的1809版本上,OpenSSH已经成为系统的内置组建,安装和使用也省心。此外有 CygwinBitvise SSH Server 等可供选择,安装好后也同步大文件也不再是问题。

参考

  1. Smarter filetransfers than rsync?
  2. OpenSSH in Windows
  3. Installing CYGWIN + SSHD for remote access through SSH on windows
  4. Installing SFTP/SSH Server on Windows using OpenSSH
  5. Bitvise SSH Server

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK