フォルダー階層内で重複するファイル名を検索しますか？

Question

imgというフォルダーがあり、このフォルダーには多くのレベルのサブフォルダーがあり、そのすべてに画像が含まれています。それらを画像サーバーにインポートします。

通常、画像（または任意のファイル）は、異なるディレクトリパスにあるか、拡張子が異なる限り、同じ名前を持つことができます。ただし、インポートするイメージサーバーでは、すべてのイメージ名が一意である必要があります（拡張子が異なっていても）。

たとえば、画像background.pngとbackground.gifは、拡張子が異なっていても同じファイル名を持っているため許可されません。それらが別々のサブフォルダにある場合でも、それらは一意である必要があります。

imgフォルダーで再帰検索を実行して、同じ名前（拡張子を除く）を持つファイルのリストを見つけることができるかどうか疑問に思っています。

これを実行できるコマンドはありますか？

&#230;ndr&#252;k · Accepted Answer

FSlint は、重複する名前を見つけるための機能を含む多目的な重複Finderです。

FSlint

UbuntuのFSlintパッケージはグラフィカルインターフェイスを強調していますが、 FSlint FAQ で説明されているように、コマンドラインインターフェイスは/usr/share/fslint/fslint/のプログラムを介して利用できます。ドキュメントには--helpオプションを使用します。例：

$ /usr/share/fslint/fslint/fslint --help File system lint. A collection of utilities to find lint on a filesystem. To get more info on each utility run 'util --help'. findup -- find DUPlicate files findnl -- find Name Lint (problems with filenames) findu8 -- find filenames with invalid utf8 encoding findbl -- find Bad Links (various problems with symlinks) findsn -- find Same Name (problems with clashing names) finded -- find Empty Directories findid -- find files with dead user IDs findns -- find Non Stripped executables findrs -- find Redundant Whitespace in files findtf -- find Temporary Files findul -- find possibly Unused Libraries zipdir -- Reclaim wasted space in ext2 directory entries $ /usr/share/fslint/fslint/findsn --help find (files) with duplicate or conflicting names. Usage: findsn [-A -c -C] [[-r] [-f] paths(s) ...] If no arguments are supplied the $PATH is searched for any redundant or conflicting files. -A reports all aliases (soft and hard links) to files. If no path(s) specified then the $PATH is searched. If only path(s) specified then they are checked for duplicate named files. You can qualify this with -C to ignore case in this search. Qualifying with -c is more restictive as only files (or directories) in the same directory whose names differ only in case are reported. I.E. -c will flag files & directories that will conflict if transfered to a case insensitive file system. Note if -c or -C specified and no path(s) specifed the current directory is assumed.

使用例：

$ /usr/share/fslint/fslint/findsn /usr/share/icons/ > icons-with-duplicate-names.txt $ head icons-with-duplicate-names.txt -rw-r--r-- 1 root root 683 2011-04-15 10:31 Humanity-Dark/AUTHORS -rw-r--r-- 1 root root 683 2011-04-15 10:31 Humanity/AUTHORS -rw-r--r-- 1 root root 17992 2011-04-15 10:31 Humanity-Dark/COPYING -rw-r--r-- 1 root root 17992 2011-04-15 10:31 Humanity/COPYING -rw-r--r-- 1 root root 4776 2011-03-29 08:57 Faenza/apps/16/DC++.xpm -rw-r--r-- 1 root root 3816 2011-03-29 08:57 Faenza/apps/22/DC++.xpm -rw-r--r-- 1 root root 4008 2011-03-29 08:57 Faenza/apps/24/DC++.xpm -rw-r--r-- 1 root root 4456 2011-03-29 08:57 Faenza/apps/32/DC++.xpm -rw-r--r-- 1 root root 7336 2011-03-29 08:57 Faenza/apps/48/DC++.xpm -rw-r--r-- 1 root root 918 2011-03-29 09:03 Faenza/apps/16/Thunar.png

ojblass · Answer

find . -mindepth 1 -printf '%h %f
' | sort -t ' ' -k 2,2 | uniq -f 1 --all-repeated=separate | tr ' ' '/'

コメントが述べているように、これはフォルダも見つけます。ファイルに制限するコマンドは次のとおりです。

find . -mindepth 1 -type f -printf '%p %f
' | sort -t ' ' -k 2,2 | uniq -f 1 --all-repeated=separate | cut -d' ' -f1

loevborg · Answer

これをduplicates.pyという名前のファイルに保存します

#!/usr/bin/env python # Syntax: duplicates.py DIRECTORY import os, sys top = sys.argv[1] d = {} for root, dirs, files in os.walk(top, topdown=False): for name in files: fn = os.path.join(root, name) basename, extension = os.path.splitext(name) basename = basename.lower() # ignore case if basename in d: print(d[basename]) print(fn) else: d[basename] = fn

次に、ファイルを実行可能にします。

chmod +x duplicates.py

で実行するこのような：

./duplicates.py ~/images

同じbasename（1）を持つファイルのペアを出力する必要があります。 Pythonで書かれているので、修正できるはずです。

geirha · Answer

これらの「重複」を確認し、それらを手動で処理するだけでよいと想定しています。もしそうなら、このbash4コードはあなたが望むことをするはずです。

declare -A array=() dupes=() while IFS= read -r -d '' file; do base=${file##*/} base=${base%.*} if [[ ${array[$base]} ]]; then dupes[$base]+=" $file" else array[$base]=$file fi done < <(find /the/dir -type f -print0) for key in "${!dupes[@]}"; do echo "$key: ${array[$key]}${dupes[$key]}" done

http://mywiki.wooledge.org/BashGuide/Arrays#Associative_Arrays および/または連想配列構文のヘルプについてはbashマニュアルを参照してください。

user unknown · Answer

これはbnameです。

#!/bin/bash # # find for jpg/png/gif more files of same basename # # echo "processing ($1) $2" bname=$(basename "$1" .$2) find -name "$bname.jpg" -or -name "$bname.png"

実行可能にする：

chmod a+x bname

呼び出す：

for ext in jpg png jpeg gif tiff; do find -name "*.$ext" -exec ./bname "{}" $ext ";" ; done

プロ：

それは簡単でシンプルなため、拡張可能です。
ファイル名の空白、タブ、改行、ページフィードを処理します。（拡張名にそのようなものがないと仮定）。

短所：

ファイル自体は常に検出され、a.jpgのa.gifが検出されると、a.gifのa.jpgも検出されます。したがって、同じベース名の10個のファイルの場合、最後に100個の一致が見つかります。

skoczen · Answer

私のニーズに合わせたloevborgのスクリプトの改善（グループ化された出力、ブラックリスト、スキャン中のクリーナー出力を含む）。 10TBドライブをスキャンしていたので、少しきれいな出力が必要でした。

使用法：

python duplicates.py DIRNAME

duplicates.py

 #!/usr/bin/env python # Syntax: duplicates.py DIRECTORY import os import sys top = sys.argv[1] d = {} file_count = 0 BLACKLIST = [".DS_Store", ] for root, dirs, files in os.walk(top, topdown=False): for name in files: file_count += 1 fn = os.path.join(root, name) basename, extension = os.path.splitext(name) # Enable this if you want to ignore case. # basename = basename.lower() if basename not in BLACKLIST: sys.stdout.write( "Scanning... %s files scanned. Currently looking at ...%s/
" % (file_count, root[-50:]) ) if basename in d: d[basename].append(fn) else: d[basename] = [fn, ] print("
Done scanning. Here are the duplicates found: ") for k, v in d.items(): if len(v) > 1: print("%s (%s):" % (k, len(v))) for f in v: print (f)