Rdfind
Rdfind
Rdfind, stands for redundant data find, is a free and open source utility to find duplicate files across and/or within directories and sub-directories.
- It compares files based on their content, not on their file names.
- Rdfind uses ranking algorithm to classify original and duplicate files.
- If you have two or more equal files, Rdfind is smart enough to find which is original file, and consider the rest of the files as duplicates.
- Once it found the duplicates, it will report them to you.
- You can decide to either delete them or replace them with hard links or symbolic (soft) links.
- Installing Rdfind
Rdfind is available in AUR.
- So, you can install it in Arch-based systems using any AUR helper program like Yay as shown below.
sudo apt-get install rdfind
- Usage
Once installed, simply run Rdfind command along with the directory path to scan for the duplicate files.
$ rdfind ~/Downloads
Rdfind command will scan ~/Downloads directory and save the results in a file named results.txt in the current working directory.
- You can view the name of the possible duplicate files in results.txt file.
cat results.txt
# Automatically generated
# duptype id depth size device inode priority name
DUPTYPE_FIRST_OCCURRENCE 1469 8 9 2050 15864884 1 /home/sk/Downloads/tor-browser_en-US/Browser/TorBrowser/Tor/PluggableTransports/fte/tests/dfas/test5.regex
DUPTYPE_WITHIN_SAME_TREE -1469 8 9 2050 15864886 1 /home/sk/Downloads/tor-browser_en-US/Browser/TorBrowser/Tor/PluggableTransports/fte/tests/dfas/test6.regex
[...]
DUPTYPE_FIRST_OCCURRENCE 13 0 403635 2050 15740257 1 /home/sk/Downloads/Hyperledger(1).pdf
DUPTYPE_WITHIN_SAME_TREE -13 0 403635 2050 15741071 1 /home/sk/Downloads/Hyperledger.pdf
# end of file
By reviewing the results.txt file, you can easily find the duplicates.
- You can remove the duplicates manually if you like.
Also, you can -dryrun option to find all duplicates in a given directory without changing anything and output the summary in your Terminal:
rdfind -dryrun true ~/Downloads
Once you found the duplicates, you can replace them with either hardlinks or symlinks.
To replace all duplicates with hardlinks, run:
rdfind -makehardlinks true ~/Downloads
To replace all duplicates with symlinks/soft links, run:
rdfind -makesymlinks true ~/Downloads
You may have some empty files in a directory and want to ignore them.
- If so, use -ignoreempty option like below.
rdfind -ignoreempty true ~/Downloads
If you don’t want the old files anymore, just delete duplicate files instead of replacing them with hard or soft links.
To delete all duplicates, simply run:
rdfind -deleteduplicates true ~/Downloads
If you do not want to ignore empty files and delete them along with all duplicates, run:
rdfind -deleteduplicates true -ignoreempty false ~/Downloads
For more details, refer the help section:
rdfind --help
And, the manual pages
man rdfind
- Suggested read