Rdfind: Unterschied zwischen den Versionen

Aus Foxwiki
Die Seite wurde neu angelegt: „===== Rdfind ===== '''Rdfind''', stands for '''r'''edundant '''d'''ata '''find''', is a free and open source utility to find duplicate files across and/or within directories and sub-directories. * It compares files based on their content, not on their file names. * Rdfind uses '''ranking''' algorithm to classify original and duplicate files. * If you have two or more equal files, Rdfind is smart enough to find which is original file, and consider the rest…“
 
Keine Bearbeitungszusammenfassung
 
(41 dazwischenliegende Versionen desselben Benutzers werden nicht angezeigt)
Zeile 1: Zeile 1:
===== Rdfind =====
'''rdfind''' - redundant data find (finds duplicate files)
'''Rdfind''', stands for '''r'''edundant '''d'''ata '''find''', is a free and open source utility to find duplicate files across and/or within directories and sub-directories.
* It compares files based on their content, not on their file names.
* Rdfind uses '''ranking''' algorithm to classify original and duplicate files.
* If you have two or more equal files, Rdfind is smart enough to find which is original file, and consider the rest of the files as duplicates.
* Once it found the duplicates, it will report them to you.
* You can decide to either delete them or replace them with [https://www.ostechnix.com/explaining-soft-link-and-hard-link-in-linux-with-examples/ hard links or symbolic (soft) links].


; Installing Rdfind
== Beschreibung ==
[[Freie Software|Freies]] Open-Source-Dienstprogramm zum Auffinden doppelter Dateien in und/oder innerhalb von Verzeichnissen und Unterverzeichnissen.
* Es vergleicht Dateien anhand ihres Inhalts, nicht anhand ihrer Dateinamen.
* Rdfind verwendet einen Ranking-Algorithmus, um Originaldateien und doppelte Dateien zu klassifizieren.
* Wenn Sie zwei oder mehr identische Dateien haben, ist Rdfind intelligent genug, um die Originaldatei zu finden und die restlichen Dateien als Duplikate zu betrachten.
* Sobald es die Duplikate gefunden hat, werden sie Ihnen gemeldet.
* Sie können entscheiden, ob Sie sie löschen oder durch [https://www.ostechnix.com/explaining-soft-link-and-hard-link-in-linux-with-examples/ Hardlinks oder symbolische (Soft-)Links] ersetzen möchten.


Rdfind is available in [https://aur.archlinux.org/packages/rdfind/ AUR].
; DESCRIPTION
* So, you can install it in Arch-based systems using any AUR helper program like [https://www.ostechnix.com/yay-found-yet-another-reliable-aur-helper/ Yay] as shown below.
rdfind finds duplicate files across and/or within several directories. It calculates checksum only if necessary. rdfind runs in O(Nlog(N)) time with N being the number of files.
<syntaxhighlight lang="bash" highlight="1" line>
 
sudo apt-get install rdfind
If two (or more) equal files are found, the program decides which of them is the original and the rest are considered duplicates. This is done by ranking the files to each other and deciding which has the highest rank. See section RANKING for details.
</syntaxhighlight>
 
By default, no action is taken besides creating a file with the detected files and showing the possible amount of saved space.
 
If you need better control over the ranking than given, you can use some preprocessor which sorts the file names in desired order and then run the program using xargs. See examples below for how to use find and xargs in conjunction with rdfind.
 
To include files or directories that have names starting with -, use rdfind ./- to not confuse them with options.
 
=== RANKING ===
Given two or more equal files, the one with the highest rank is selected to be the original and the rest are duplicates. The rules of ranking are given below, where the rules are executed from start until an original has been found. Given two files A and B which have equal size and content, the ranking is as follows:
* If A was found while scanning an input argument earlier than B, A is higher ranked.
* If A was found at a directory depth lower than B, A is higher ranked (A is closer to the root).
* If A and B are found during scanning of the same input argument and share the same directory depth, the one that ranks highest depends on if deterministic operation is enabled. This is on by default, see option -deterministic). If enabled, which one ranks highest is unspecified but deterministic. If disabled, the one that was reported first from the file system is highest ranked.
 
=== SECURITY CONSIDERATIONS ===
Avoid manipulating the directories while rdfind is reading. rdfind is quite brittle in that case. Especially, when deleting or making links, rdfind
can be subject to a symlink attack. Use with care!


; Usage
== Anwendung ==
Once installed, simply run Rdfind command along with the directory path to scan for the duplicate files.
Nach der Installation führen Sie einfach den Befehl Rdfind zusammen mit dem Verzeichnispfad aus, um nach doppelten Dateien zu suchen.


<syntaxhighlight lang="bash" highlight="1" line>
<syntaxhighlight lang="bash" highlight="1" line>
$ rdfind ~/Downloads
rdfind ~/Downloads
</syntaxhighlight>
</syntaxhighlight>


Rdfind command will scan ~/Downloads directory and save the results in a file named '''results.txt''' in the current working directory.
Der Befehl Rdfind durchsucht das Verzeichnis ~/Downloads und speichert die Ergebnisse in einer Datei namens "results.txt" im aktuellen Arbeitsverzeichnis.
* You can view the name of the possible duplicate files in results.txt file.
* Sie können den Namen der möglichen doppelten Dateien in der Datei "results.txt" anzeigen.


<syntaxhighlight lang="bash" highlight="1" line>
<syntaxhighlight lang="bash" highlight="1" line>
Zeile 37: Zeile 52:
</syntaxhighlight>
</syntaxhighlight>


By reviewing the results.txt file, you can easily find the duplicates.
Durch Überprüfung der Datei results.txt können Sie die Duplikate leicht finden.
* You can remove the duplicates manually if you like.
* Sie können die Duplikate manuell entfernen, wenn Sie möchten.


Also, you can '''-dryrun''' option to find all duplicates in a given directory without changing anything and output the summary in your Terminal:
Sie können auch die Option "‚‘‚dryrun‘'" verwenden, um alle Duplikate in einem bestimmten Verzeichnis zu finden, ohne etwas zu ändern, und die Zusammenfassung in Ihrem Terminal auszugeben:


<syntaxhighlight lang="bash" highlight="1" line>
<syntaxhighlight lang="bash" highlight="1" line>
Zeile 46: Zeile 61:
</syntaxhighlight>
</syntaxhighlight>


Once you found the duplicates, you can replace them with either hardlinks or symlinks.
Sobald Sie die Duplikate gefunden haben, können Sie sie entweder durch Hardlinks oder durch Symlinks ersetzen.
 
Um alle Duplikate durch Hardlinks zu ersetzen, führen Sie Folgendes aus:
<syntaxhighlight lang="bash" highlight="1" line>
rdfind -makehardlinks true ~/Downloads
</syntaxhighlight>
 
Um alle Duplikate durch Symlinks/Softlinks zu ersetzen, führen Sie Folgendes aus:
<syntaxhighlight lang="bash" highlight="1" line>
rdfind -makesymlinks true ~/Downloads
</syntaxhighlight>
 
Möglicherweise befinden sich in einem Verzeichnis einige leere Dateien, die Sie ignorieren möchten.
* Wenn dies der Fall ist, verwenden Sie die Option "-ignoreempty" wie unten beschrieben.
<syntaxhighlight lang="bash" highlight="1" line>
rdfind -ignoreempty true ~/Downloads
</syntaxhighlight>
 
Wenn Sie die alten Dateien nicht mehr benötigen, löschen Sie die doppelten Dateien einfach, anstatt sie durch harte oder weiche Links zu ersetzen.
 
Um alle Duplikate zu löschen, führen Sie einfach Folgendes aus:
<syntaxhighlight lang="bash" highlight="1" line>
rdfind -deleteduplicates true ~/Downloads
</syntaxhighlight>
 
Wenn Sie leere Dateien nicht ignorieren und zusammen mit allen Duplikaten löschen möchten, führen Sie Folgendes aus:
<syntaxhighlight lang="bash" highlight="1" line>
rdfind -deleteduplicates true -ignoreempty false ~/Downloads
</syntaxhighlight>
 
Weitere Informationen finden Sie im Hilfebereich:
<syntaxhighlight lang="bash" highlight="1" line>
rdfind --help
</syntaxhighlight>
 
Und die Handbuchseiten
<syntaxhighlight lang="bash" highlight="1" line>
man rdfind
</syntaxhighlight>
 
=== EXAMPLES ===
Search for duplicate files in the home directory and a backup directory:
rdfind ~ /mnt/backup
 
Delete duplicates in a backup directory:
rdfind -deleteduplicates true /mnt/backup
 
Search for duplicate files in directories called foo:
find . -type d -name foo -print0 |xargs -0 rdfind
 
 
=== BUGS/FEATURES ===
When specifying the same directory twice, it keeps the first encountered as the most important (original), and the rest as duplicates. This might not be what you want.
 
The symlink creates absolute links. This might not be what you want. To create relative links instead, you may use the symlink (2) command, which is able to convert absolute links to relative links.
 
Older versions unfortunately contained a misspelling on the word occurrence. This is now corrected (since 1.3), which might affect user scripts parsing the output file written by rdfind.
 
== Installation ==
<syntaxhighlight lang="bash" highlight="1" line>
sudo apt-get install rdfind
</syntaxhighlight>
 
== Aufruf ==
<syntaxhighlight lang="bash" highlight="1" line>
rdfind [ options ] directory1 | file1 [ directory2 | file2 ] ...
</syntaxhighlight>
 
=== Optionen ===
; Searching options
-ignoreempty true|false
Ignore empty files. Setting this to true (the default) is equivalent to -minsize 1, false is equivalent to -minsize 0.
 
-minsize N
Ignores files with less than N bytes. Default is 1, meaning empty files are ignored.
 
-maxsize N
Ignores files with N bytes or more. Default is 0, which means this check is disabled.
 
-followsymlinks true|false
Follow symlinks. Default is false.
 
-removeidentinode true|false
Removes items found which have identical inode and device ID. Default is true.
 
-checksum md5|sha1|sha256|sha512
What type of checksum to be used: md5, sha1, sha256 or sha512. The default is sha1 since version 1.4.0.
 
-deterministic true|false
If set (the default), sort files of equal rank in an unspecified but deterministic order. This makes the behaviour independent of in which order
files are listed when querying the file system.
 
; Action options
-makesymlinks true|false
Replace duplicate files with symbolic links. Default is false.
 
-makehardlinks true|false
Replace duplicate files with hard links. Default is false.
 
-makeresultsfile true|false
Make a results file in the current directory. Default is true. If the file exists, it is overwritten. This does not affect whether items are
deleted. See -dryrun for how to disable deletions.
 
-outputname name
Make the results file name to be "name" instead of the default results.txt.


To replace all duplicates with hardlinks, run:
-deleteduplicates true|false
Delete (unlink) files. Default is false.


$ rdfind -makehardlinks true ~/Downloads
; General options
-sleep Xms
Sleeps X milliseconds between reading each file, to reduce load. Default is 0 (no sleep). Note that only a few values are supported at present:
0,1-5,10,25,50,100 milliseconds.


To replace all duplicates with symlinks/soft links, run:
-n, -dryrun true|false
Displays what should have been done, don't actually delete or link anything. Default is false.


$ rdfind -makesymlinks true ~/Downloads
-h, -help, --help
Displays a brief help message.


You may have some empty files in a directory and want to ignore them.
-v, -version, --version
* If so, use '''-ignoreempty''' option like below.
Displays the version number.


$ rdfind -ignoreempty true ~/Downloads
=== Parameter ===
=== Umgebungsvariablen ===


If you don’t want the old files anymore, just delete duplicate files instead of replacing them with hard or soft links.
=== Exit-Status ===
0 on success, nonzero otherwise.


To delete all duplicates, simply run:
== Konfiguration ==
=== Dateien ===
results.txt (the default name is results.txt and can be changed with option outputname, see above) The results file results.txt will contain one row per duplicate file found, along with a header row explaining the columns.


$ rdfind -deleteduplicates true ~/Downloads
A text describes why the file is considered a duplicate
DUPTYPE_UNKNOWN some internal error
DUPTYPE_FIRST_OCCURRENCE the file that is considered to be the original.
DUPTYPE_WITHIN_SAME_TREE files in the same tree (found when processing the directory in the same input argument as the original)
DUPTYPE_OUTSIDE_TREE the file is found during processing another input argument than the original.


If you do not want to ignore empty files and delete them along with all duplicates, run:
<noinclude>


$ rdfind -deleteduplicates true -ignoreempty false ~/Downloads
== Anhang ==
=== Siehe auch ===
* [[md5sum]]
* [[sha1sum]]
* [[find]]
* [[symlinks]]
{{Special:PrefixIndex/{{BASEPAGENAME}}}}


For more details, refer the help section:
==== Dokumentation ====
===== Man-Page =====
===== Info-Pages =====
==== Links ====
===== Projekt =====


$ rdfind --help
===== Weblinks =====
* [https://www.ostechnix.com/remove-duplicate-files-android-duplicate-files-fixer/ Remove Duplicate Files From Your Android With Duplicate Files Fixer]


And, the manual pages:
{{SORTIERUNG:rdfind}}
{{DISPLAYTITLE:rdfind}}


$ man rdfind
[[Kategorie:Linux/Datei/Duplikate]]
[[Kategorie:Linux/Befehl]]


'''Suggested read:'''* [https://www.ostechnix.com/remove-duplicate-files-android-duplicate-files-fixer/ Remove Duplicate Files From Your Android With Duplicate Files Fixer]
</noinclude>

Aktuelle Version vom 30. November 2024, 23:12 Uhr

rdfind - redundant data find (finds duplicate files)

Beschreibung

Freies Open-Source-Dienstprogramm zum Auffinden doppelter Dateien in und/oder innerhalb von Verzeichnissen und Unterverzeichnissen.

  • Es vergleicht Dateien anhand ihres Inhalts, nicht anhand ihrer Dateinamen.
  • Rdfind verwendet einen Ranking-Algorithmus, um Originaldateien und doppelte Dateien zu klassifizieren.
  • Wenn Sie zwei oder mehr identische Dateien haben, ist Rdfind intelligent genug, um die Originaldatei zu finden und die restlichen Dateien als Duplikate zu betrachten.
  • Sobald es die Duplikate gefunden hat, werden sie Ihnen gemeldet.
  • Sie können entscheiden, ob Sie sie löschen oder durch Hardlinks oder symbolische (Soft-)Links ersetzen möchten.
DESCRIPTION

rdfind finds duplicate files across and/or within several directories. It calculates checksum only if necessary. rdfind runs in O(Nlog(N)) time with N being the number of files.

If two (or more) equal files are found, the program decides which of them is the original and the rest are considered duplicates. This is done by ranking the files to each other and deciding which has the highest rank. See section RANKING for details.

By default, no action is taken besides creating a file with the detected files and showing the possible amount of saved space.

If you need better control over the ranking than given, you can use some preprocessor which sorts the file names in desired order and then run the program using xargs. See examples below for how to use find and xargs in conjunction with rdfind.

To include files or directories that have names starting with -, use rdfind ./- to not confuse them with options.

RANKING

Given two or more equal files, the one with the highest rank is selected to be the original and the rest are duplicates. The rules of ranking are given below, where the rules are executed from start until an original has been found. Given two files A and B which have equal size and content, the ranking is as follows:

  • If A was found while scanning an input argument earlier than B, A is higher ranked.
  • If A was found at a directory depth lower than B, A is higher ranked (A is closer to the root).
  • If A and B are found during scanning of the same input argument and share the same directory depth, the one that ranks highest depends on if deterministic operation is enabled. This is on by default, see option -deterministic). If enabled, which one ranks highest is unspecified but deterministic. If disabled, the one that was reported first from the file system is highest ranked.

SECURITY CONSIDERATIONS

Avoid manipulating the directories while rdfind is reading. rdfind is quite brittle in that case. Especially, when deleting or making links, rdfind can be subject to a symlink attack. Use with care!

Anwendung

Nach der Installation führen Sie einfach den Befehl Rdfind zusammen mit dem Verzeichnispfad aus, um nach doppelten Dateien zu suchen.

rdfind ~/Downloads

Der Befehl Rdfind durchsucht das Verzeichnis ~/Downloads und speichert die Ergebnisse in einer Datei namens "results.txt" im aktuellen Arbeitsverzeichnis.

  • Sie können den Namen der möglichen doppelten Dateien in der Datei "results.txt" anzeigen.
cat results.txt
# Automatically generated
# duptype id depth size device inode priority name
DUPTYPE_FIRST_OCCURRENCE 1469 8 9 2050 15864884 1 /home/sk/Downloads/tor-browser_en-US/Browser/TorBrowser/Tor/PluggableTransports/fte/tests/dfas/test5.regex
DUPTYPE_WITHIN_SAME_TREE -1469 8 9 2050 15864886 1 /home/sk/Downloads/tor-browser_en-US/Browser/TorBrowser/Tor/PluggableTransports/fte/tests/dfas/test6.regex
[...]
DUPTYPE_FIRST_OCCURRENCE 13 0 403635 2050 15740257 1 /home/sk/Downloads/Hyperledger(1).pdf
DUPTYPE_WITHIN_SAME_TREE -13 0 403635 2050 15741071 1 /home/sk/Downloads/Hyperledger.pdf
# end of file

Durch Überprüfung der Datei results.txt können Sie die Duplikate leicht finden.

  • Sie können die Duplikate manuell entfernen, wenn Sie möchten.

Sie können auch die Option "‚‘‚dryrun‘'" verwenden, um alle Duplikate in einem bestimmten Verzeichnis zu finden, ohne etwas zu ändern, und die Zusammenfassung in Ihrem Terminal auszugeben:

rdfind -dryrun true ~/Downloads

Sobald Sie die Duplikate gefunden haben, können Sie sie entweder durch Hardlinks oder durch Symlinks ersetzen.

Um alle Duplikate durch Hardlinks zu ersetzen, führen Sie Folgendes aus:

rdfind -makehardlinks true ~/Downloads

Um alle Duplikate durch Symlinks/Softlinks zu ersetzen, führen Sie Folgendes aus:

rdfind -makesymlinks true ~/Downloads

Möglicherweise befinden sich in einem Verzeichnis einige leere Dateien, die Sie ignorieren möchten.

  • Wenn dies der Fall ist, verwenden Sie die Option "-ignoreempty" wie unten beschrieben.
rdfind -ignoreempty true ~/Downloads

Wenn Sie die alten Dateien nicht mehr benötigen, löschen Sie die doppelten Dateien einfach, anstatt sie durch harte oder weiche Links zu ersetzen.

Um alle Duplikate zu löschen, führen Sie einfach Folgendes aus:

rdfind -deleteduplicates true ~/Downloads

Wenn Sie leere Dateien nicht ignorieren und zusammen mit allen Duplikaten löschen möchten, führen Sie Folgendes aus:

rdfind -deleteduplicates true -ignoreempty false ~/Downloads

Weitere Informationen finden Sie im Hilfebereich:

rdfind --help

Und die Handbuchseiten

man rdfind

EXAMPLES

Search for duplicate files in the home directory and a backup directory:

rdfind ~ /mnt/backup

Delete duplicates in a backup directory:

rdfind -deleteduplicates true /mnt/backup

Search for duplicate files in directories called foo:

find . -type d -name foo -print0 |xargs -0 rdfind


BUGS/FEATURES

When specifying the same directory twice, it keeps the first encountered as the most important (original), and the rest as duplicates. This might not be what you want.

The symlink creates absolute links. This might not be what you want. To create relative links instead, you may use the symlink (2) command, which is able to convert absolute links to relative links.

Older versions unfortunately contained a misspelling on the word occurrence. This is now corrected (since 1.3), which might affect user scripts parsing the output file written by rdfind.

Installation

sudo apt-get install rdfind

Aufruf

rdfind [ options ] directory1 | file1 [ directory2 | file2 ] ...

Optionen

Searching options

-ignoreempty true|false Ignore empty files. Setting this to true (the default) is equivalent to -minsize 1, false is equivalent to -minsize 0.

-minsize N Ignores files with less than N bytes. Default is 1, meaning empty files are ignored.

-maxsize N Ignores files with N bytes or more. Default is 0, which means this check is disabled.

-followsymlinks true|false Follow symlinks. Default is false.

-removeidentinode true|false Removes items found which have identical inode and device ID. Default is true.

-checksum md5|sha1|sha256|sha512 What type of checksum to be used: md5, sha1, sha256 or sha512. The default is sha1 since version 1.4.0.

-deterministic true|false If set (the default), sort files of equal rank in an unspecified but deterministic order. This makes the behaviour independent of in which order files are listed when querying the file system.

Action options

-makesymlinks true|false Replace duplicate files with symbolic links. Default is false.

-makehardlinks true|false Replace duplicate files with hard links. Default is false.

-makeresultsfile true|false Make a results file in the current directory. Default is true. If the file exists, it is overwritten. This does not affect whether items are deleted. See -dryrun for how to disable deletions.

-outputname name Make the results file name to be "name" instead of the default results.txt.

-deleteduplicates true|false Delete (unlink) files. Default is false.

General options

-sleep Xms Sleeps X milliseconds between reading each file, to reduce load. Default is 0 (no sleep). Note that only a few values are supported at present: 0,1-5,10,25,50,100 milliseconds.

-n, -dryrun true|false Displays what should have been done, don't actually delete or link anything. Default is false.

-h, -help, --help Displays a brief help message.

-v, -version, --version Displays the version number.

Parameter

Umgebungsvariablen

Exit-Status

0 on success, nonzero otherwise.

Konfiguration

Dateien

results.txt (the default name is results.txt and can be changed with option outputname, see above) The results file results.txt will contain one row per duplicate file found, along with a header row explaining the columns.

A text describes why the file is considered a duplicate

DUPTYPE_UNKNOWN some internal error
DUPTYPE_FIRST_OCCURRENCE the file that is considered to be the original.
DUPTYPE_WITHIN_SAME_TREE files in the same tree (found when processing the directory in the same input argument as the original)
DUPTYPE_OUTSIDE_TREE the file is found during processing another input argument than the original.


Anhang

Siehe auch

Dokumentation

Man-Page
Info-Pages

Links

Projekt
Weblinks