Kategorie:Linux/Datei/Duplikate: Unterschied zwischen den Versionen

Aus Foxwiki
Keine Bearbeitungszusammenfassung
Der Seiteninhalt wurde durch einen anderen Text ersetzt: „Kategorie:Linux/Datei
Markierung: Ersetzt
 
(26 dazwischenliegende Versionen desselben Benutzers werden nicht angezeigt)
Zeile 1: Zeile 1:
== How to Find and Remove Duplicate Files on Linux ==
[[Kategorie:Linux/Datei]]
Whether you’re using Linux on your desktop or a server, there are good tools that will scan your system for duplicate files and help you remove them to [https://www.howtogeek.com/185173/4-ways-to-free-up-disk-space-on-linux/ free up space].
* Solid graphical and command-line interfaces are both available.
 
Duplicate files are an unnecessary waste of disk space.
* After all, if you really need the same file in two different locations you could always set up a symbolic link or hard link, storing the data in only one location on disk.
 
=== FSlint ===
FSlint is available in various Linux distributions’ software repositories, including Ubuntu, Debian, Fedora, and Red Hat.
* Just fire up your package manager and install the “fslint” package.
* This utility provides a convenient graphical interface by default, but it also includes command-line versions of its various functions.
* Like many Linux applications, the FSlint graphical interface is just a front-end that uses the FSlint commands underneath.
 
Don’t let that scare you away from using FSlint’s convenient graphical interface, though.
* By default, it opens with the Duplicates pane selected and your home directory as the default search path.
* All you have to do is click the Find button and FSlint will find a list of duplicate files in directories under your home folder.
* Use the buttons to delete any files you want to remove, and double-click them to preview them.
 
[[Image:Bild1.png|top]]
 
Note that the command-line utilities aren’t in your path by default, so you can’t run them like typical commands.
* On Ubuntu, you’ll find them under /usr/share/fslint/fslint.
* So, if you wanted to run the entire fslint scan on a single directory, here are the commands you’d run on Ubuntu:
 
cd /usr/share/fslint/fslint
 
./fslint /path/to/directory
 
This command won’t actually delete anything.
* It will just print a list of duplicate files — you’re on your own for the rest.
 
[[Image:Bild2.png|top]]
 
=== fdupes ===
The fdupes command isn’t usually installed by default, but it’s available in many Linux distribution’s repositories.
* It’s a simple command-line tool.
* This is probably the most convenient, quickest tool you can use if you want to find duplicate files in an environment where you only have access to a Linux command line, not a graphical user interface.
 
Using it is simple.
* Just run the fdupes command followed by the path to a directory.
* So, '''fdupes /home/chris''' would list all duplicate files in the directory /home/chris — but not in subdirectories! The''' fdupes -r /home/chris''' command would recursively search all subdirectories inside /home/chris for duplicate files and list them.
 
This tool won’t automatically remove anything, it will just show you a list of duplicate files.
* You can then delete the duplicate files by hand, if you like.
* You can also run the command with the -d switch to have it help you delete files.
* You’ll be prompted to choose the files you want to preserve.
 
[[Image:Bild3.png|top]]
 
=== dupeGuru, [http://www.hardcoded.net/dupeguru_me/ dupeGuru Music Edition], and [http://www.hardcoded.net/dupeguru_pe/ dupeGuru Pictures Edition] ===
'''RELATED:''' [https://www.howtogeek.com/142414/how-to-install-software-from-outside-ubuntus-software-repositories/ How to Install Software From Outside Ubuntu's Software Repositories]
 
Yes, we’re going to recommend dupeGuru once again.
* It’s an open-source and cross-platform tool that’s so useful we’ve already recommended it for [https://www.howtogeek.com/200962/how-to-find-and-remove-duplicate-files-on-windows/ finding duplicate files on Windows] and [https://www.howtogeek.com/201007/how-to-find-and-remove-duplicate-files-on-mac-os-x/ cleaning up duplicate files on a Mac].
 
dupeGuru is a bit less convenient because it’s not available in most Linux distributions’ software repositories — although it is available in Arch Linux’s repositories.
* However, the dupeGuru website offers [https://www.howtogeek.com/142414/how-to-install-software-from-outside-ubuntus-software-repositories/ a PPA] that lets you easily install their software packages on Ubuntu and Ubuntu-based Linux distributions.
* Users of other Linux distributions could even [https://www.howtogeek.com/105413/how-to-compile-and-install-from-source-on-ubuntu/ compile it from source].
 
As on Windows and Mac, dupeGuru offers three different editions — a standard edition for basic duplicate-file-scanning, an edition designed for finding duplicate songs that may have been ripped or encoded differently, and an edition intended for finding similar photos that have been rotated, resized, or otherwise modified.
* You can get them all from the dupeGuru website, and all three are available in the Ubuntu PPA.
 
This application works just as it does on other platforms.
* Launch it, add one or more folders to scan, and click Scan.
* You’ll see a list of duplicate files, and you can check them off and remove them — or move them to other platforms.
* You can also easily open and examine the file with a double-click.
 
After installation, the Ubuntu package must be launched from a command line — for example, with the''' dupeguru_se''' command for the standard edition.
* There appears to be no desktop shortcut installed by default.
* This lack of system integration is the only reason we can’t recommend this utility more highly, as it works well once you get it installed and launched.
 
[[Image:Bild4.png|top]]
 
As you might expect, this isn’t a complete list.
* You’ll find many other duplicate-file-finding utilities — mostly commands without a graphical interface — in your Linux distribution’s package manager.
* Unless you have specific needs, the above tools are our favorites and the ones we recommend.
 
== How To Find And Delete Duplicate Files In Linux ==
I always backup the configuration files or any old files to somewhere in my hard disk before edit or modify them, so I can restore them from the backup if I accidentally did something wrong.
* But the problem is I forgot to clean up those files and my hard disk is filled with a lot of duplicate files after a certain period of time.
* I feel either too lazy to clean the old files or afraid that I may delete an important files.
* If you’re anything like me and overwhelming with multiple copies of same files in different backup directories, you can find and delete duplicate files using the tools given below in Unix-like operating systems.
 
'''A word of caution:'''
 
Please be careful while deleting duplicate files.
* If you’re not careful, it will lead you to [https://www.ostechnix.com/prevent-files-folders-accidental-deletion-modification-linux/ accidental data loss].
* I advice you to pay extra attention while using these tools.
 
=== Find And Delete Duplicate Files In Linux ===
For the purpose of this guide, I am going to discuss about three utilities namely
# Rdfind
# Fdupes
# FSlint
 
These three utilities are free, open source and works on most Unix-like operating systems.
 
===== Rdfind =====
'''Rdfind''', stands for '''r'''edundant '''d'''ata '''find''', is a free and open source utility to find duplicate files across and/or within directories and sub-directories.
* It compares files based on their content, not on their file names.
* Rdfind uses '''ranking''' algorithm to classify original and duplicate files.
* If you have two or more equal files, Rdfind is smart enough to find which is original file, and consider the rest of the files as duplicates.
* Once it found the duplicates, it will report them to you.
* You can decide to either delete them or replace them with [https://www.ostechnix.com/explaining-soft-link-and-hard-link-in-linux-with-examples/ hard links or symbolic (soft) links].
 
'''Installing Rdfind'''
 
Rdfind is available in [https://aur.archlinux.org/packages/rdfind/ AUR].
* So, you can install it in Arch-based systems using any AUR helper program like [https://www.ostechnix.com/yay-found-yet-another-reliable-aur-helper/ Yay] as shown below.
 
$ yay -S rdfind
 
On Debian, Ubuntu, Linux Mint:
 
$ sudo apt-get install rdfind
 
On Fedora:
 
$ sudo dnf install rdfind
 
On RHEL, CentOS:
 
$ sudo yum install epel-release
$ sudo yum install rdfind
 
'''Usage'''
 
Once installed, simply run Rdfind command along with the directory path to scan for the duplicate files.
 
$ rdfind ~/Downloads
 
[[Image:Bild5.png]]
 
Scan a directory with Rdfind
 
As you see in the above screenshot, Rdfind command will scan ~/Downloads directory and save the results in a file named '''results.txt''' in the current working directory.
* You can view the name of the possible duplicate files in results.txt file.
 
'''$ cat results.txt'''
<nowiki># Automatically generated</nowiki>
<nowiki># duptype id depth size device inode priority name</nowiki>
DUPTYPE_FIRST_OCCURRENCE 1469 8 9 2050 15864884 1 /home/sk/Downloads/tor-browser_en-US/Browser/TorBrowser/Tor/PluggableTransports/fte/tests/dfas/test5.regex
DUPTYPE_WITHIN_SAME_TREE -1469 8 9 2050 15864886 1 /home/sk/Downloads/tor-browser_en-US/Browser/TorBrowser/Tor/PluggableTransports/fte/tests/dfas/test6.regex
[...]
DUPTYPE_FIRST_OCCURRENCE 13 0 403635 2050 15740257 1 /home/sk/Downloads/Hyperledger(1).pdf
DUPTYPE_WITHIN_SAME_TREE -13 0 403635 2050 15741071 1 /home/sk/Downloads/Hyperledger.pdf
<nowiki># end of file</nowiki>
 
By reviewing the results.txt file, you can easily find the duplicates.
* You can remove the duplicates manually if you want to.
 
Also, you can '''-dryrun''' option to find all duplicates in a given directory without changing anything and output the summary in your Terminal:
 
$ rdfind -dryrun true ~/Downloads
 
Once you found the duplicates, you can replace them with either hardlinks or symlinks.
 
To replace all duplicates with hardlinks, run:
 
$ rdfind -makehardlinks true ~/Downloads
 
To replace all duplicates with symlinks/soft links, run:
 
$ rdfind -makesymlinks true ~/Downloads
 
You may have some empty files in a directory and want to ignore them.
* If so, use '''-ignoreempty''' option like below.
 
$ rdfind -ignoreempty true ~/Downloads
 
If you don’t want the old files anymore, just delete duplicate files instead of replacing them with hard or soft links.
 
To delete all duplicates, simply run:
 
$ rdfind -deleteduplicates true ~/Downloads
 
If you do not want to ignore empty files and delete them along with all duplicates, run:
 
$ rdfind -deleteduplicates true -ignoreempty false ~/Downloads
 
For more details, refer the help section:
 
$ rdfind --help
 
And, the manual pages:
 
$ man rdfind
 
'''Suggested read:'''* [https://www.ostechnix.com/remove-duplicate-files-android-duplicate-files-fixer/ Remove Duplicate Files From Your Android With Duplicate Files Fixer]
 
===== Fdupes =====
'''Fdupes''' is yet another command line utility to identify and remove the duplicate files within specified directories and the sub-directories.&nbsp; It is free, open source utility written in '''C''' programming language.
* Fdupes identifies the duplicates by comparing file sizes, partial MD5 signatures, full MD5 signatures, and finally performing a byte-by-byte comparison for verification.
 
Similar to Rdfind utility, Fdupes comes with quite handful of options to perform operations, such as:* Recursively search duplicate files in directories and sub-directories
* Exclude empty files and hidden files from consideration
* Show the size of the duplicates
* Delete duplicates immediately as they encountered
* Exclude files with different owner/group or permission bits as duplicates
* And a lot more.
 
'''Installing Fdupes'''
 
Fdupes is available in the default repositories of most Linux distributions.
 
On Arch Linux and its variants like Antergos, Manjaro Linux, install it using Pacman like below.
 
$ sudo pacman -S fdupes
 
On Debian, Ubuntu, Linux Mint:
 
$ sudo apt-get install fdupes
 
On Fedora:
 
$ sudo dnf install fdupes
 
On RHEL, CentOS:
 
$ sudo yum install epel-release
$ sudo yum install fdupes
 
'''Usage'''
 
Fdupes usage is pretty simple.
* Just run the following command to find out the duplicate files in a directory, for example '''~/Downloads'''.
 
$ fdupes ~/Downloads
 
Sample output from my system:
 
/home/sk/Downloads/Hyperledger.pdf
/home/sk/Downloads/Hyperledger(1).pdf
 
As you can see, I have a duplicate file in '''/home/sk/Downloads/''' directory.
* It shows the duplicates from the parent directory only.
* How to view the duplicates from sub-directories? Just use '''-r''' option like below.
 
$ fdupes -r ~/Downloads
 
Now you will see the duplicates from '''/home/sk/Downloads/''' directory and its sub-directories as well.
 
Fdupes can also be able to find duplicates from multiple directories at once.
 
$ fdupes ~/Downloads ~/Documents/ostechnix
 
You can even search multiple directories, one recursively like below:
 
$ fdupes ~/Downloads -r ~/Documents/ostechnix
 
The above commands searches for duplicates in “~/Downloads” directory and “~/Documents/ostechnix” directory and its sub-directories.
 
Sometimes, you might want to know the size of the duplicates in a directory.
* If so, use '''-S''' option like below.
 
'''$ fdupes -S ~/Downloads'''
403635 bytes each:
/home/sk/Downloads/Hyperledger.pdf
/home/sk/Downloads/Hyperledger(1).pdf
 
Similarly, to view the size of the duplicates in parent and child directories, use '''-Sr''' option.
 
We can exclude empty and hidden files from consideration using '''-n''' and '''-A''' respectively.
 
$ fdupes -n ~/Downloads
$ fdupes -A ~/Downloads
 
The first command will exclude zero-length files from consideration and the latter will exclude hidden files from consideration while searching for duplicates in the specified directory.
 
To summarize&nbsp;duplicate files information, use '''-m''' option.
 
$ fdupes -m ~/Downloads
1 duplicate files (in 1 sets), occupying 403.6 kilobytes
 
To delete all duplicates, use '''-d''' option.
 
$ fdupes -d ~/Downloads
 
Sample output:
 
[1] /home/sk/Downloads/Hyperledger Fabric Installation.pdf
[2] /home/sk/Downloads/Hyperledger Fabric Installation(1).pdf
 
Set 1 of 1, preserve files [1 - 2, all]:
 
This command will prompt you for files to preserve and delete all other duplicates.
* Just enter any number to preserve the corresponding file and delete the remaining files.
* Pay more attention while using this option.
* You might delete original files if you’re not be careful.
 
If you want to preserve the first file in each set of duplicates and delete the others without prompting each time, use '''-dN''' option (not recommended).
 
$ fdupes -dN ~/Downloads
 
To delete duplicates as they are encountered, use '''-I''' flag.
 
$ fdupes -I ~/Downloads
 
For more details about Fdupes, view the help section and man pages.
 
$ fdupes --help
$ man fdupes
 
'''Also read:'''* [https://www.ostechnix.com/duplicate-photos-fixer-organize-photo-library-well/ Duplicate Photos Fixer: Organize Your Photo Library Well]
 
===== FSlint =====
'''FSlint''' is yet another duplicate file finder utility that I use from time to time to get rid of the unnecessary duplicate files and free up the disk space in my Linux system.
* Unlike the other two utilities, FSlint has both GUI and CLI modes.
* So, it is more user-friendly tool for newbies.
* FSlint not just finds the duplicates, but also bad symlinks, bad names, temp files, bad IDS, empty directories, and non stripped binaries etc.
 
'''Installing FSlint'''
 
FSlint is available in [https://aur.archlinux.org/packages/fslint/ AUR], so you can install it using any AUR helpers.
 
$ yay -S fslint
 
On Debian, Ubuntu, Linux Mint:
 
$ sudo apt-get install fslint
 
On Fedora:
 
$ sudo dnf install fslint
 
On RHEL, CentOS:
 
$ sudo yum install epel-release
$ sudo yum install fslint
 
Once it is installed, launch it from menu or application launcher.
 
This is how FSlint GUI looks like.
 
[[Image:Bild6.png|top]]
 
FSlint interface
 
As you can see, the interface of FSlint is user-friendly and self-explanatory.
* In the '''Search path''' tab, add the path of the directory you want to scan and click '''Find''' button on the lower left corner to find the duplicates.
* Check the recurse option to recursively search for duplicates in directories and sub-directories.
* The FSlint will quickly scan the given directory and list out them.
 
[[Image:Bild8.png|top]]fslint GUI
 
From the list, choose the duplicates you want to clean and select any one of them given actions like Save, Delete, Merge and Symlink.
 
In the '''Advanced search parameters''' tab, you can specify the paths to exclude while searching for duplicates.
 
[[Image:Bild7.png|top]]
 
fslint advanced search
 
'''FSlint command line options'''
 
FSlint provides a collection of the following CLI utilities to find duplicates in your filesystem:* '''findup''' — find DUPlicate files
* '''findnl''' — find Name Lint (problems with filenames)
* '''findu8''' — find filenames with invalid utf8 encoding
* '''findbl''' — find Bad Links (various problems with symlinks)
* '''findsn''' — find Same Name (problems with clashing names)
* '''finded''' — find Empty Directories
* '''findid''' — find files with dead user IDs
* '''findns''' — find Non Stripped executables
* '''findrs''' — find Redundant Whitespace in files
* '''findtf''' — find Temporary Files
* '''findul''' — find possibly Unused Libraries
* '''zipdir''' — Reclaim wasted space in ext2 directory entries
 
All of these utilities are available under '''/usr/share/fslint/fslint/fslint''' location.
 
For example, to find duplicates in a given directory, do:
 
$ /usr/share/fslint/fslint/findup ~/Downloads/
 
Similarly, to find empty directories, the command would be:
 
$ /usr/share/fslint/fslint/finded ~/Downloads/
 
To get more details on each utility, for example '''findup''', run:
 
$ /usr/share/fslint/fslint/findup --help
 
For more details about FSlint, refer the help section and man pages.
 
$ /usr/share/fslint/fslint/fslint --help
$ man fslint
 
; Resources
* [https://rdfind.pauldreik.se/ Rdfind Website]
* [https://github.com/pauldreik/rdfind Rdfind GitHub Repository]
* [https://github.com/adrianlopezroche/fdupes Fdupes GitHub Repository]
* [http://www.pixelbeat.org/fslint/ FSlint Website]
* [https://github.com/pixelb/fslint FSlint GitHub Repository]
 
== How to Find Duplicate Files in Linux and Remove Them ==
''Brief: FSlint is a great GUI tool to find duplicate files in Linux and remove them.
* FDUPES also find the files with same name in Linux but in the command line way.&nbsp;''
 
If you have this habit of downloading everything from the web like me, you will end up having multiple duplicate files.
* Most often, I can find the same songs or a bunch of images in different directories or end up backing up some files at two different places.
* It’s a pain locating these duplicate files manually and deleting them to recover the disk space.
 
If you want to save yourself from this pain, there are various Linux applications that will help you in locating these duplicate files and removing them.
* In this article, we will cover how you can find and remove these files in Ubuntu.
 
''Note: You should know what you are doing.
* If you are using a new tool, it’s always better to try it in a virtual directory structure to figure out what it does before taking it to root or home folder.
* Also, '''it’s always better to [https://itsfoss.com/backup-restore-linux-timeshift/ backup your Linux system]!'''''
 
=== FSlint: GUI tool to find and remove duplicate files ===
FSlint helps you search and remove duplicate files, empty directories or files with incorrect names.
* It has a command-line as well as GUI mode with a set of tools to perform a variety of tasks.
 
To install FSlint, type the below command in Terminal.
 
sudo apt install fslint
 
Open FSlint from the Dash search.
 
FSlint includes a number of options to choose from.
* There are options to find duplicate files, installed packages, bad names, name clashes, temp files, empty directories etc.
* Choose the Search Path and the task which you want to perform from the left panel and click on Find to locate the files.
* Once done, you can select the files you want to remove and Delete it.
 
You can click on any file directory from the search result to open it if you are not sure and want to double check it before deleting it.
 
You can select '''Advanced search parameters''' where you can define rules to exclude certain file types or exclude directories which you don’t want to search.
 
=== FDUPES: CLI tool to find and remove duplicate files ===
FDUPES is a command line utility to find and remove duplicate files in Linux.
* It can list out the duplicate files in a particular folder or recursively within a folder.
* It asks which file to preserve before deletion and the noprompt option lets you delete all the duplicate files keeping the first one without asking you.
 
==== Installation on Debian / Ubuntu ====
sudo apt install fdupes
 
==== Installation on Fedora ====
dnf install fdupes
 
Once installed, you can search duplicate files using the below command:
 
fdupes /path/to/folder
 
For recursively searching within a folder, use -r option
 
fdupes -r /home
 
This will only list the duplicate files and do not delete them by itself.
* You can manually delete the duplicate files or use ''-d'' option to delete them.
 
fdupes -d /path/to/folder
 
This won’t delete anything on its own but will display all the duplicate files and gives you an option to either delete files one by one or select a range to delete it.
* If you want to delete all files without asking and preserving the first one, you can use the noprompt ''-N'' option.
 
In the above screenshot, you can see the ''-d'' command showing all the duplicate files within the folder and asking you to select the file which you want to preserve.
 
==== Final Words ====
There are many other ways and tools to find and delete duplicate files in Linux.
* Personally, I prefer the FDUPES command line tool; it’s simple and takes no resources.
 
How do you deal with the finding and removing duplicate files in your Linux system? Do tell us in the comment section.
 
=== Quellen ===
* [https://itsfoss.com/find-duplicate-files-linux/ https://itsfoss.com/find-duplicate-files-linux/]
* [https://www.howtogeek.com/201140/how-to-find-and-remove-duplicate-files-on-linux/ https://www.howtogeek.com/201140/how-to-find-and-remove-duplicate-files-on-linux/]
* [https://www.ostechnix.com/how-to-find-and-delete-duplicate-files-in-linux/ https://www.ostechnix.com/how-to-find-and-delete-duplicate-files-in-linux/]

Aktuelle Version vom 30. November 2024, 23:07 Uhr

Seiten in der Kategorie „Linux/Datei/Duplikate“

Folgende 3 Seiten sind in dieser Kategorie, von 3 insgesamt.