9

The 8TB Backup And The Search For Duplicates

 3 years ago
source link: https://blog.wirelessmoves.com/2021/01/the-8tb-backup-and-the-search-for-duplicates.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

The 8TB Backup And The Search For Duplicates

My 8 TB drive I use for all stuff that might may or may not be useful in the future again is filling up faster than prices are dropping for bigger drives. In the past I’ve always ‘upgraded’ after drive capacities have doubled. But currently, the price for a 16 TB external drive still hovers around €300. And since I always keep several backups, that price would multiply. So that’s not quite in the cards for the time being. So I did the next best thing and had a look for duplicates. Now how hard can that be, I thought.

The easiest way to do this is to just look for duplicate names. In my case that would have given quite a number of false positives. When creating backups of optical media for example, the audio and video files created often get the same name. Also, one would miss quite a number of duplicates when the filename was changed at some point in time.

Fortunately someone had this problem before and was kind enough to write ‘rdfind‘ (redundant data find). Available in the Debian repository for easy installation with apt install’, rdfind will not only look for duplicate filenames, but also for identical file sizes, compares first and last byte sequences and calculates a sha1 checksum over potential matches. Once the tool got to work on my 8 TB drive, it took a couple of hours before it returned with a result. I was positively surprised with all the things it had found that I would have missed if I had done a manual search or just looked for identical file names. In the end I could remove over 700 GB of duplicated files. One additionally nice feature: One can give rdfind a minimum file size (in bytes) which makes it ignore all smaller files:

rdfind -minsize 50000000 ~/Documents

Posted on January 2, 2021Author MartinCategories Uncategorized

One thought on “The 8TB Backup And The Search For Duplicates”

  1. sebastian says:

    I used fslint for this task in the past.

    It used to be in the Ubuntu repos, but right now I don’t find it in the Mint 20 repos (which are based on Ubuntu as one should know).

    http://www.pixelbeat.org/fslint/

Leave a Reply Cancel reply

Your email address will not be published.

Comment

Name

Email

Website

Post navigation


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK