|
|
(23 dazwischenliegende Versionen desselben Benutzers werden nicht angezeigt) |
Zeile 1: |
Zeile 1: |
| == Find and Remove Duplicate Files ==
| | [[Kategorie:Linux/Datei]] |
| Whether you’re using Linux on your desktop or a server, there are good tools that will scan your system for duplicate files and help you remove them to [https://www.howtogeek.com/185173/4-ways-to-free-up-disk-space-on-linux/ free up space].
| |
| * Solid graphical and command-line interfaces are both available.
| |
| | |
| Duplicate files are an unnecessary waste of disk space.
| |
| * After all, if you really need the same file in two different locations you could always set up a symbolic link or hard link, storing the data in only one location on disk.
| |
| | |
| === FSlint ===
| |
| FSlint is available in various Linux distributions’ software repositories, including Ubuntu, Debian, Fedora, and Red Hat.
| |
| * Just fire up your package manager and install the “fslint” package.
| |
| * This utility provides a convenient graphical interface by default, but it also includes command-line versions of its various functions.
| |
| * Like many Linux applications, the FSlint graphical interface is just a front-end that uses the FSlint commands underneath.
| |
| | |
| Don’t let that scare you away from using FSlint’s convenient graphical interface, though.
| |
| * By default, it opens with the Duplicates pane selected and your home directory as the default search path.
| |
| * All you have to do is click the Find button and FSlint will find a list of duplicate files in directories under your home folder.
| |
| * Use the buttons to delete any files you want to remove, and double-click them to preview them.
| |
| | |
| Note that the command-line utilities aren’t in your path by default, so you can’t run them like typical commands.
| |
| * On Ubuntu, you’ll find them under /usr/share/fslint/fslint.
| |
| * So, if you wanted to run the entire fslint scan on a single directory, here are the commands you’d run on Ubuntu:
| |
| | |
| <syntaxhighlight lang="bash" highlight="1" line>
| |
| cd /usr/share/fslint/fslint
| |
| ./fslint /path/to/directory
| |
| </syntaxhighlight>
| |
| | |
| This command won’t actually delete anything.
| |
| * It will just print a list of duplicate files — you’re on your own for the rest.
| |
| | |
| === fdupes ===
| |
| The fdupes command isn’t usually installed by default, but it’s available in many Linux distribution’s repositories.
| |
| * It’s a simple command-line tool.
| |
| * This is probably the most convenient, quickest tool you can use if you want to find duplicate files in an environment where you only have access to a Linux command line, not a graphical user interface.
| |
| | |
| Using it is simple.
| |
| * Just run the fdupes command followed by the path to a directory.
| |
| * So, '''fdupes /home/chris''' would list all duplicate files in the directory /home/chris — but not in subdirectories! The''' fdupes -r /home/chris''' command would recursively search all subdirectories inside /home/chris for duplicate files and list them.
| |
| | |
| This tool won’t automatically remove anything, it will just show you a list of duplicate files.
| |
| * You can then delete the duplicate files by hand, if you like.
| |
| * You can also run the command with the -d switch to have it help you delete files.
| |
| * You’ll be prompted to choose the files you want to preserve.
| |
| | |
| === dupeGuru, [http://www.hardcoded.net/dupeguru_me/ dupeGuru Music Edition], and [http://www.hardcoded.net/dupeguru_pe/ dupeGuru Pictures Edition] ===
| |
| '''RELATED:''' [https://www.howtogeek.com/142414/how-to-install-software-from-outside-ubuntus-software-repositories/ How to Install Software From Outside Ubuntu's Software Repositories]
| |
| | |
| Yes, we’re going to recommend dupeGuru once again.
| |
| * It’s an open-source and cross-platform tool that’s so useful we’ve already recommended it for [https://www.howtogeek.com/200962/how-to-find-and-remove-duplicate-files-on-windows/ finding duplicate files on Windows] and [https://www.howtogeek.com/201007/how-to-find-and-remove-duplicate-files-on-mac-os-x/ cleaning up duplicate files on a Mac].
| |
| | |
| dupeGuru is a bit less convenient because it’s not available in most Linux distributions’ software repositories — although it is available in Arch Linux’s repositories.
| |
| * However, the dupeGuru website offers [https://www.howtogeek.com/142414/how-to-install-software-from-outside-ubuntus-software-repositories/ a PPA] that lets you easily install their software packages on Ubuntu and Ubuntu-based Linux distributions.
| |
| * Users of other Linux distributions could even [https://www.howtogeek.com/105413/how-to-compile-and-install-from-source-on-ubuntu/ compile it from source].
| |
| | |
| As on Windows and Mac, dupeGuru offers three different editions — a standard edition for basic duplicate-file-scanning, an edition designed for finding duplicate songs that may have been ripped or encoded differently, and an edition intended for finding similar photos that have been rotated, resized, or otherwise modified.
| |
| * You can get them all from the dupeGuru website, and all three are available in the Ubuntu PPA.
| |
| | |
| This application works just as it does on other platforms.
| |
| * Launch it, add one or more folders to scan, and click Scan.
| |
| * You’ll see a list of duplicate files, and you can check them off and remove them — or move them to other platforms.
| |
| * You can also easily open and examine the file with a double-click.
| |
| | |
| After installation, the Ubuntu package must be launched from a command line — for example, with the''' dupeguru_se''' command for the standard edition.
| |
| * There appears to be no desktop shortcut installed by default.
| |
| * This lack of system integration is the only reason we can’t recommend this utility more highly, as it works well once you get it installed and launched.
| |
| | |
| [[Image:Bild4.png|top]]
| |
| | |
| As you might expect, this isn’t a complete list.
| |
| * You’ll find many other duplicate-file-finding utilities — mostly commands without a graphical interface — in your Linux distribution’s package manager.
| |
| * Unless you have specific needs, the above tools are our favorites and the ones we recommend.
| |
| | |
| == How To Find And Delete Duplicate Files In Linux ==
| |
| I always backup the configuration files or any old files to somewhere in my hard disk before edit or modify them, so I can restore them from the backup if I accidentally did something wrong.
| |
| * But the problem is I forgot to clean up those files and my hard disk is filled with a lot of duplicate files after a certain period of time.
| |
| * I feel either too lazy to clean the old files or afraid that I may delete an important files.
| |
| * If you’re anything like me and overwhelming with multiple copies of same files in different backup directories, you can find and delete duplicate files using the tools given below in Unix-like operating systems.
| |
| | |
| ; A word of caution
| |
| Please be careful while deleting duplicate files.
| |
| * If you’re not careful, it will lead you to [https://www.ostechnix.com/prevent-files-folders-accidental-deletion-modification-linux/ accidental data loss].
| |
| * I advice you to pay extra attention while using these tools.
| |
| | |
| === Find And Delete Duplicate Files In Linux ===
| |
| For the purpose of this guide, I am going to discuss three utilities namely
| |
| # Rdfind
| |
| # Fdupes
| |
| # FSlint
| |
| | |
| These three utilities are free, open source and works on most Unix-like operating systems.
| |
| | |
| ===== Rdfind =====
| |
| '''Rdfind''', stands for '''r'''edundant '''d'''ata '''find''', is a free and open source utility to find duplicate files across and/or within directories and sub-directories.
| |
| * It compares files based on their content, not on their file names.
| |
| * Rdfind uses '''ranking''' algorithm to classify original and duplicate files.
| |
| * If you have two or more equal files, Rdfind is smart enough to find which is original file, and consider the rest of the files as duplicates.
| |
| * Once it found the duplicates, it will report them to you.
| |
| * You can decide to either delete them or replace them with [https://www.ostechnix.com/explaining-soft-link-and-hard-link-in-linux-with-examples/ hard links or symbolic (soft) links].
| |
| | |
| ; Installing Rdfind
| |
| | |
| Rdfind is available in [https://aur.archlinux.org/packages/rdfind/ AUR].
| |
| * So, you can install it in Arch-based systems using any AUR helper program like [https://www.ostechnix.com/yay-found-yet-another-reliable-aur-helper/ Yay] as shown below.
| |
| <syntaxhighlight lang="bash" highlight="1" line>
| |
| sudo apt-get install rdfind
| |
| </syntaxhighlight>
| |
| | |
| ; Usage
| |
| Once installed, simply run Rdfind command along with the directory path to scan for the duplicate files.
| |
| | |
| <syntaxhighlight lang="bash" highlight="1" line>
| |
| $ rdfind ~/Downloads
| |
| </syntaxhighlight>
| |
| | |
| Rdfind command will scan ~/Downloads directory and save the results in a file named '''results.txt''' in the current working directory.
| |
| * You can view the name of the possible duplicate files in results.txt file.
| |
| | |
| <syntaxhighlight lang="bash" highlight="1" line>
| |
| cat results.txt
| |
| # Automatically generated
| |
| # duptype id depth size device inode priority name
| |
| DUPTYPE_FIRST_OCCURRENCE 1469 8 9 2050 15864884 1 /home/sk/Downloads/tor-browser_en-US/Browser/TorBrowser/Tor/PluggableTransports/fte/tests/dfas/test5.regex
| |
| DUPTYPE_WITHIN_SAME_TREE -1469 8 9 2050 15864886 1 /home/sk/Downloads/tor-browser_en-US/Browser/TorBrowser/Tor/PluggableTransports/fte/tests/dfas/test6.regex
| |
| [...]
| |
| DUPTYPE_FIRST_OCCURRENCE 13 0 403635 2050 15740257 1 /home/sk/Downloads/Hyperledger(1).pdf
| |
| DUPTYPE_WITHIN_SAME_TREE -13 0 403635 2050 15741071 1 /home/sk/Downloads/Hyperledger.pdf
| |
| # end of file
| |
| </syntaxhighlight>
| |
| | |
| By reviewing the results.txt file, you can easily find the duplicates.
| |
| * You can remove the duplicates manually if you like.
| |
| | |
| Also, you can '''-dryrun''' option to find all duplicates in a given directory without changing anything and output the summary in your Terminal:
| |
| | |
| <syntaxhighlight lang="bash" highlight="1" line>
| |
| rdfind -dryrun true ~/Downloads
| |
| </syntaxhighlight>
| |
| | |
| Once you found the duplicates, you can replace them with either hardlinks or symlinks.
| |
| | |
| To replace all duplicates with hardlinks, run:
| |
| | |
| $ rdfind -makehardlinks true ~/Downloads
| |
| | |
| To replace all duplicates with symlinks/soft links, run:
| |
| | |
| $ rdfind -makesymlinks true ~/Downloads
| |
| | |
| You may have some empty files in a directory and want to ignore them.
| |
| * If so, use '''-ignoreempty''' option like below.
| |
| | |
| $ rdfind -ignoreempty true ~/Downloads
| |
| | |
| If you don’t want the old files anymore, just delete duplicate files instead of replacing them with hard or soft links.
| |
| | |
| To delete all duplicates, simply run:
| |
| | |
| $ rdfind -deleteduplicates true ~/Downloads
| |
| | |
| If you do not want to ignore empty files and delete them along with all duplicates, run:
| |
| | |
| $ rdfind -deleteduplicates true -ignoreempty false ~/Downloads
| |
| | |
| For more details, refer the help section:
| |
| | |
| $ rdfind --help
| |
| | |
| And, the manual pages:
| |
| | |
| $ man rdfind
| |
| | |
| '''Suggested read:'''* [https://www.ostechnix.com/remove-duplicate-files-android-duplicate-files-fixer/ Remove Duplicate Files From Your Android With Duplicate Files Fixer]
| |
| | |
| ===== Fdupes =====
| |
| '''Fdupes''' is yet another command line utility to identify and remove the duplicate files within specified directories and the sub-directories. It is free, open source utility written in '''C''' programming language.
| |
| * Fdupes identifies the duplicates by comparing file sizes, partial MD5 signatures, full MD5 signatures, and finally performing a byte-by-byte comparison for verification.
| |
| | |
| Similar to Rdfind utility, Fdupes comes with quite handful of options to perform operations, such as:* Recursively search duplicate files in directories and sub-directories
| |
| * Exclude empty files and hidden files from consideration
| |
| * Show the size of the duplicates
| |
| * Delete duplicates immediately as they encountered
| |
| * Exclude files with different owner/group or permission bits as duplicates
| |
| * And a lot more.
| |
| | |
| '''Installing Fdupes'''
| |
| | |
| Fdupes is available in the default repositories of most Linux distributions.
| |
| | |
| On Arch Linux and its variants like Antergos, Manjaro Linux, install it using Pacman like below.
| |
| | |
| $ sudo pacman -S fdupes
| |
| | |
| On Debian, Ubuntu, Linux Mint:
| |
| | |
| $ sudo apt-get install fdupes
| |
| | |
| On Fedora:
| |
| | |
| $ sudo dnf install fdupes
| |
| | |
| On RHEL, CentOS:
| |
| | |
| $ sudo yum install epel-release
| |
| $ sudo yum install fdupes
| |
| | |
| '''Usage'''
| |
| | |
| Fdupes usage is pretty simple.
| |
| * Just run the following command to find out the duplicate files in a directory, for example '''~/Downloads'''.
| |
| | |
| $ fdupes ~/Downloads
| |
| | |
| Sample output from my system:
| |
| | |
| /home/sk/Downloads/Hyperledger.pdf
| |
| /home/sk/Downloads/Hyperledger(1).pdf
| |
| | |
| As you can see, I have a duplicate file in '''/home/sk/Downloads/''' directory.
| |
| * It shows the duplicates from the parent directory only.
| |
| * How to view the duplicates from sub-directories? Just use '''-r''' option like below.
| |
| | |
| $ fdupes -r ~/Downloads
| |
| | |
| Now you will see the duplicates from '''/home/sk/Downloads/''' directory and its sub-directories as well.
| |
| | |
| Fdupes can also be able to find duplicates from multiple directories at once.
| |
| | |
| $ fdupes ~/Downloads ~/Documents/ostechnix
| |
| | |
| You can even search multiple directories, one recursively like below:
| |
| | |
| $ fdupes ~/Downloads -r ~/Documents/ostechnix
| |
| | |
| The above commands searches for duplicates in “~/Downloads” directory and “~/Documents/ostechnix” directory and its sub-directories.
| |
| | |
| Sometimes, you might want to know the size of the duplicates in a directory.
| |
| * If so, use '''-S''' option like below.
| |
| | |
| '''$ fdupes -S ~/Downloads'''
| |
| 403635 bytes each:
| |
| /home/sk/Downloads/Hyperledger.pdf
| |
| /home/sk/Downloads/Hyperledger(1).pdf
| |
| | |
| Similarly, to view the size of the duplicates in parent and child directories, use '''-Sr''' option.
| |
| | |
| We can exclude empty and hidden files from consideration using '''-n''' and '''-A''' respectively.
| |
| | |
| $ fdupes -n ~/Downloads
| |
| $ fdupes -A ~/Downloads
| |
| | |
| The first command will exclude zero-length files from consideration and the latter will exclude hidden files from consideration while searching for duplicates in the specified directory.
| |
| | |
| To summarize duplicate files information, use '''-m''' option.
| |
| | |
| $ fdupes -m ~/Downloads
| |
| 1 duplicate files (in 1 sets), occupying 403.6 kilobytes
| |
| | |
| To delete all duplicates, use '''-d''' option.
| |
| | |
| $ fdupes -d ~/Downloads
| |
| | |
| Sample output:
| |
| | |
| [1] /home/sk/Downloads/Hyperledger Fabric Installation.pdf
| |
| [2] /home/sk/Downloads/Hyperledger Fabric Installation(1).pdf
| |
| | |
| Set 1 of 1, preserve files [1 - 2, all]:
| |
| | |
| This command will prompt you for files to preserve and delete all other duplicates.
| |
| * Just enter any number to preserve the corresponding file and delete the remaining files.
| |
| * Pay more attention while using this option.
| |
| * You might delete original files if you’re not be careful.
| |
| | |
| If you want to preserve the first file in each set of duplicates and delete the others without prompting each time, use '''-dN''' option (not recommended).
| |
| | |
| $ fdupes -dN ~/Downloads
| |
| | |
| To delete duplicates as they are encountered, use '''-I''' flag.
| |
| | |
| $ fdupes -I ~/Downloads
| |
| | |
| For more details about Fdupes, view the help section and man pages.
| |
| | |
| $ fdupes --help
| |
| $ man fdupes
| |
| | |
| '''Also read:'''* [https://www.ostechnix.com/duplicate-photos-fixer-organize-photo-library-well/ Duplicate Photos Fixer: Organize Your Photo Library Well]
| |
| | |
| ===== FSlint =====
| |
| '''FSlint''' is yet another duplicate file finder utility that I use from time to time to get rid of the unnecessary duplicate files and free up the disk space in my Linux system.
| |
| * Unlike the other two utilities, FSlint has both GUI and CLI modes.
| |
| * So, it is more user-friendly tool for newbies.
| |
| * FSlint not just finds the duplicates, but also bad symlinks, bad names, temp files, bad IDS, empty directories, and non stripped binaries etc.
| |
| | |
| '''Installing FSlint'''
| |
| | |
| FSlint is available in [https://aur.archlinux.org/packages/fslint/ AUR], so you can install it using any AUR helpers.
| |
| | |
| $ yay -S fslint
| |
| | |
| On Debian, Ubuntu, Linux Mint:
| |
| | |
| $ sudo apt-get install fslint
| |
| | |
| On Fedora:
| |
| | |
| $ sudo dnf install fslint
| |
| | |
| On RHEL, CentOS:
| |
| | |
| $ sudo yum install epel-release
| |
| $ sudo yum install fslint
| |
| | |
| Once it is installed, launch it from menu or application launcher.
| |
| | |
| This is how FSlint GUI looks like.
| |
| | |
| [[Image:Bild6.png|top]]
| |
| | |
| FSlint interface
| |
| | |
| As you can see, the interface of FSlint is user-friendly and self-explanatory.
| |
| * In the '''Search path''' tab, add the path of the directory you want to scan and click '''Find''' button on the lower left corner to find the duplicates.
| |
| * Check the recurse option to recursively search for duplicates in directories and sub-directories.
| |
| * The FSlint will quickly scan the given directory and list out them.
| |
| | |
| [[Image:Bild8.png|top]]fslint GUI
| |
| | |
| From the list, choose the duplicates you want to clean and select any one of them given actions like Save, Delete, Merge and Symlink.
| |
| | |
| In the '''Advanced search parameters''' tab, you can specify the paths to exclude while searching for duplicates.
| |
| | |
| [[Image:Bild7.png|top]]
| |
| | |
| fslint advanced search
| |
| | |
| '''FSlint command line options'''
| |
| | |
| FSlint provides a collection of the following CLI utilities to find duplicates in your filesystem:* '''findup''' — find DUPlicate files
| |
| * '''findnl''' — find Name Lint (problems with filenames)
| |
| * '''findu8''' — find filenames with invalid utf8 encoding
| |
| * '''findbl''' — find Bad Links (various problems with symlinks)
| |
| * '''findsn''' — find Same Name (problems with clashing names)
| |
| * '''finded''' — find Empty Directories
| |
| * '''findid''' — find files with dead user IDs
| |
| * '''findns''' — find Non Stripped executables
| |
| * '''findrs''' — find Redundant Whitespace in files
| |
| * '''findtf''' — find Temporary Files
| |
| * '''findul''' — find possibly Unused Libraries
| |
| * '''zipdir''' — Reclaim wasted space in ext2 directory entries
| |
| | |
| All of these utilities are available under '''/usr/share/fslint/fslint/fslint''' location.
| |
| | |
| For example, to find duplicates in a given directory, do:
| |
| | |
| $ /usr/share/fslint/fslint/findup ~/Downloads/
| |
| | |
| Similarly, to find empty directories, the command would be:
| |
| | |
| $ /usr/share/fslint/fslint/finded ~/Downloads/
| |
| | |
| To get more details on each utility, for example '''findup''', run:
| |
| | |
| $ /usr/share/fslint/fslint/findup --help
| |
| | |
| For more details about FSlint, refer the help section and man pages.
| |
| | |
| $ /usr/share/fslint/fslint/fslint --help
| |
| $ man fslint
| |
| | |
| ; Resources
| |
| * [https://rdfind.pauldreik.se/ Rdfind Website]
| |
| * [https://github.com/pauldreik/rdfind Rdfind GitHub Repository]
| |
| * [https://github.com/adrianlopezroche/fdupes Fdupes GitHub Repository]
| |
| * [http://www.pixelbeat.org/fslint/ FSlint Website]
| |
| * [https://github.com/pixelb/fslint FSlint GitHub Repository]
| |
| | |
| == How to Find Duplicate Files in Linux and Remove Them ==
| |
| ''Brief: FSlint is a great GUI tool to find duplicate files in Linux and remove them.
| |
| * FDUPES also find the files with same name in Linux but in the command line way. ''
| |
| | |
| If you have this habit of downloading everything from the web like me, you will end up having multiple duplicate files.
| |
| * Most often, I can find the same songs or a bunch of images in different directories or end up backing up some files at two different places.
| |
| * It’s a pain locating these duplicate files manually and deleting them to recover the disk space.
| |
| | |
| If you want to save yourself from this pain, there are various Linux applications that will help you in locating these duplicate files and removing them.
| |
| * In this article, we will cover how you can find and remove these files in Ubuntu.
| |
| | |
| ''Note: You should know what you are doing.
| |
| * If you are using a new tool, it’s always better to try it in a virtual directory structure to figure out what it does before taking it to root or home folder.
| |
| * Also, '''it’s always better to [https://itsfoss.com/backup-restore-linux-timeshift/ backup your Linux system]!'''''
| |
| | |
| === FSlint: GUI tool to find and remove duplicate files ===
| |
| FSlint helps you search and remove duplicate files, empty directories or files with incorrect names.
| |
| * It has a command-line as well as GUI mode with a set of tools to perform a variety of tasks.
| |
| | |
| To install FSlint, type the below command in Terminal.
| |
| | |
| sudo apt install fslint
| |
| | |
| Open FSlint from the Dash search.
| |
| | |
| FSlint includes a number of options to choose from.
| |
| * There are options to find duplicate files, installed packages, bad names, name clashes, temp files, empty directories etc.
| |
| * Choose the Search Path and the task which you want to perform from the left panel and click on Find to locate the files.
| |
| * Once done, you can select the files you want to remove and Delete it.
| |
| | |
| You can click on any file directory from the search result to open it if you are not sure and want to double check it before deleting it.
| |
| | |
| You can select '''Advanced search parameters''' where you can define rules to exclude certain file types or exclude directories which you don’t want to search.
| |
| | |
| === FDUPES: CLI tool to find and remove duplicate files ===
| |
| FDUPES is a command line utility to find and remove duplicate files in Linux.
| |
| * It can list out the duplicate files in a particular folder or recursively within a folder.
| |
| * It asks which file to preserve before deletion and the noprompt option lets you delete all the duplicate files keeping the first one without asking you.
| |
| | |
| ==== Installation on Debian / Ubuntu ====
| |
| sudo apt install fdupes
| |
| | |
| ==== Installation on Fedora ====
| |
| dnf install fdupes
| |
| | |
| Once installed, you can search duplicate files using the below command:
| |
| | |
| fdupes /path/to/folder
| |
| | |
| For recursively searching within a folder, use -r option
| |
| | |
| fdupes -r /home
| |
| | |
| This will only list the duplicate files and do not delete them by itself.
| |
| * You can manually delete the duplicate files or use ''-d'' option to delete them.
| |
| | |
| fdupes -d /path/to/folder
| |
| | |
| This won’t delete anything on its own but will display all the duplicate files and gives you an option to either delete files one by one or select a range to delete it.
| |
| * If you want to delete all files without asking and preserving the first one, you can use the noprompt ''-N'' option.
| |
| | |
| In the above screenshot, you can see the ''-d'' command showing all the duplicate files within the folder and asking you to select the file which you want to preserve.
| |
| | |
| ==== Final Words ====
| |
| There are many other ways and tools to find and delete duplicate files in Linux.
| |
| * Personally, I prefer the FDUPES command line tool; it’s simple and takes no resources.
| |
| | |
| How do you deal with the finding and removing duplicate files in your Linux system? Do tell us in the comment section.
| |
| | |
| === Quellen ===
| |
| * [https://itsfoss.com/find-duplicate-files-linux/ https://itsfoss.com/find-duplicate-files-linux/]
| |
| * [https://www.howtogeek.com/201140/how-to-find-and-remove-duplicate-files-on-linux/ https://www.howtogeek.com/201140/how-to-find-and-remove-duplicate-files-on-linux/]
| |
| * [https://www.ostechnix.com/how-to-find-and-delete-duplicate-files-in-linux/ https://www.ostechnix.com/how-to-find-and-delete-duplicate-files-in-linux/]
| |