ソートせずに2つのファイル間の共通の行を削除する方法は？

Question

ソートされていない2つのファイルがあり、いくつかの行に共通しています。

file1.txt

Z B A H L

file2.txt

S L W Q A

一般的な行を削除するために使用している方法は次のとおりです。

sort -u file1.txt > file1_sorted.txt sort -u file2.txt > file2_sorted.txt comm -23 file1_sorted.txt file2_sorted.txt > file_final.txt

出力：

B H Z

問題は、file1.txtの順序を維持したいということです。つまり、

望ましい出力：

Z B H

私が考えた解決策の1つは、file2.txtのすべての行を読み取るループを実行し、次のことを行うことです。

sed -i '/^${line_file2}$/d' file1.txt

しかし、ファイルが大きい場合、パフォーマンスが低下する可能性があります。

私のアイデアは好きですか？
それを行うための代替手段はありますか？

Kent · Accepted Answer

grepまたはawk：

awk 'NR==FNR{a[$0]=1;next}!a[$0]' file2 file1

perreal · Answer

Grepのみを使用できます（反転には-v、ファイルには-f）。 input1のどの行にも一致しないinput2のGrep行：

grep -vf input2 input1

提供：

Z B H

terdon · Answer

私はこの種のものに使用する小さなPerlスクリプトを記述しました。それはあなたが求めるものよりも多くのことができますが、あなたが必要とすることもできます：

#!/usr/bin/env Perl -w use strict; use Getopt::Std; my %opts; getopts('hvfcmdk:', \%opts); my $missing=$opts{m}||undef; my $column=$opts{k}||undef; my $common=$opts{c}||undef; my $verbose=$opts{v}||undef; my $fast=$opts{f}||undef; my $dupes=$opts{d}||undef; $missing=1 unless $common || $dupes;; &usage() unless $ARGV[1]; &usage() if $opts{h}; my (%found,%k,%fields); if ($column) { die("The -k option only works in fast (-f) mode
") unless $fast; $column--; ## So I don't need to count from 0 } open(my $F1,"$ARGV[0]")||die("Cannot open $ARGV[0]: $!
"); while(<$F1>){ chomp; if ($fast){ my @aa=split(/\s+/,$_); $k{$aa[0]}++; $found{$aa[0]}++; } else { $k{$_}++; $found{$_}++; } } close($F1); my $n=0; open(F2,"$ARGV[1]")||die("Cannot open $ARGV[1]: $!
"); my $size=0; if($verbose){ while(<F2>){ $size++; } } close(F2); open(F2,"$ARGV[1]")||die("Cannot open $ARGV[1]: $!
"); while(<F2>){ next if /^\s+$/; $n++; chomp; print STDERR "." if $verbose && $n % 10==0; print STDERR "[$n of $size lines]
" if $verbose && $n % 800==0; if($fast){ my @aa=split(/\s+/,$_); $k{$aa[0]}++ if defined($k{$aa[0]}); $fields{$aa[0]}=\@aa if $column; } else{ my @keys=keys(%k); foreach my $key(keys(%found)){ if (/\Q$key/){ $k{$key}++ ; $found{$key}=undef unless $dupes; } } } } close(F2); print STDERR "[$n of $size lines]
" if $verbose; if ($column) { $missing && do map{my @aa=@{$fields{$_}}; print "$aa[$column]
" unless $k{$_}>1}keys(%k); $common && do map{my @aa=@{$fields{$_}}; print "$aa[$column]
" if $k{$_}>1}keys(%k); $dupes && do map{my @aa=@{$fields{$_}}; print "$aa[$column]
" if $k{$_}>2}keys(%k); } else { $missing && do map{print "$_
" unless $k{$_}>1}keys(%k); $common && do map{print "$_
" if $k{$_}>1}keys(%k); $dupes && do map{print "$_
" if $k{$_}>2}keys(%k); } sub usage{ print STDERR <<EndOfHelp; USAGE: compare_lists.pl FILE1 FILE2 This script will compare FILE1 and FILE2, searching for the contents of FILE1 in FILE2 (and NOT vice versa). FILE one must be one search pattern per line, the search pattern need only be contained within one of the lines of FILE2. OPTIONS: -c : Print patterns COMMON to both files -f : Search only the first characters of each line of FILE2 for the search pattern given in FILE1 -d : Print duplicate entries -m : Print patterns MISSING in FILE2 (default) -h : Print this help and exit EndOfHelp exit(0); }

あなたの場合、次のように実行します

list_compare.pl -cf file1.txt file2.txt

-fオプションは、file2の最初のWord（空白で定義された）のみを比較し、処理を大幅に高速化します。行全体を比較するには、-f。