Ruby：配列内の重複値を見つけて返す方法は？

Question

arrは文字列の配列です。例：["hello", "world", "stack", "overflow", "hello", "again"]。

arrに重複があるかどうかを確認する簡単でエレガントな方法は何ですか？.

例：

["A", "B", "C", "B", "A"] # => "A" or "B" ["A", "B", "C"] # => nil

Naveed · Accepted Answer

a = ["A", "B", "C", "B", "A"] a.detect{ |e| a.count(e) > 1 }

更新

これはあまりエレガントな答えではないことは知っていますが、大好きです。それは美しいワンライナーコードです。巨大なデータセットを処理する必要がない限り、完全に機能します。

より高速なソリューションをお探しですか？どうぞ！

def find_one_using_hash_map(array) map = {} dup = nil array.each do |v| map[v] = (map[v] || 0 ) + 1 if map[v] > 1 dup = v break end end return dup end

リニア、O(n)ですが、複数のLOCを管理する必要があり、テストケースなどが必要です！

さらに高速なソリューションが必要な場合は、代わりにCを試してください:)

そして、さまざまなソリューションを比較するgitがあります： https://Gist.github.com/naveed-ahmad/8f0b926ffccf5fbd206a1cc58ce9743e

Ryan LeCompte · Answer

これはいくつかの方法で実行できますが、最初のオプションが最速です：

ary = ["A", "B", "C", "B", "A"] ary.group_by{ |e| e }.select { |k, v| v.size > 1 }.map(&:first) ary.sort.chunk{ |e| e }.select { |e, chunk| chunk.size > 1 }.map(&:first)

そして、O（N ^ 2）オプション（つまり、効率が悪い）：

ary.select{ |e| ary.count(e) > 1 }.uniq

Chris Heald · Answer

オブジェクトのインデックス（左からカウント）がオブジェクトのインデックス（右からカウント）と等しくない最初のインスタンスを見つけるだけです。

arr.detect {|e| arr.rindex(e) != arr.index(e) }

重複がない場合、戻り値はnilになります。

追加のオブジェクトの作成に依存せず、#indexと#rindexがCで実装されているため、これもこれまでのスレッドで投稿された最速のソリューションだと思います。 OランタイムはN ^ 2であるため、Sergioのランタイムよりも遅くなりますが、「遅い」部分がCで実行されるため、ウォールタイムははるかに速くなる可能性があります。

JjP · Answer

detectは重複を1つだけ検出します。 find_allはそれらすべてを見つけます：

a = ["A", "B", "C", "B", "A"] a.find_all { |e| a.count(e) > 1 }

Cary Swoveland · Answer

重複を見つけるもう2つの方法を次に示します。

セットを使用する

require 'set' def find_a_dup_using_set(arr) s = Set.new arr.find { |e| !s.add?(e) } end find_a_dup_using_set arr #=> "hello"

selectの代わりにfindを使用して、すべての重複の配列を返します。

Array#differenceを使用

class Array def difference(other) h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 } reject { |e| h[e] > 0 && h[e] -= 1 } end end def find_a_dup_using_difference(arr) arr.difference(arr.uniq).first end find_a_dup_using_difference arr #=> "hello"

.firstをドロップして、すべての重複の配列を返します。

重複がない場合、両方のメソッドはnilを返します。

I Array#difference をRubyコアに追加することを提案しました。詳細は私の答え here にあります。

ベンチマーク

提案された方法を比較しましょう。まず、テスト用の配列が必要です。

CAPS = ('AAA'..'ZZZ').to_a.first(10_000) def test_array(nelements, ndups) arr = CAPS[0, nelements-ndups] arr = arr.concat(arr[0,ndups]).shuffle end

異なるテストアレイのベンチマークを実行する方法：

require 'fruity' def benchmark(nelements, ndups) arr = test_array nelements, ndups puts "
#{ndups} duplicates
" compare( Naveed: -> {arr.detect{|e| arr.count(e) > 1}}, Sergio: -> {(arr.inject(Hash.new(0)) {|h,e| h[e] += 1; h}.find {|k,v| v > 1} || [nil]).first }, Ryan: -> {(arr.group_by{|e| e}.find {|k,v| v.size > 1} || [nil]).first}, Chris: -> {arr.detect {|e| arr.rindex(e) != arr.index(e)} }, Cary_set: -> {find_a_dup_using_set(arr)}, Cary_diff: -> {find_a_dup_using_difference(arr)} ) end

@JjPの回答を含めなかったのは、重複が1つだけ返されるためであり、その回答が変更されると、@ Naveedの以前の回答と同じになります。また、@ Naveedの回答の前に投稿された@Marinの回答も含まれていませんでした。

また、すべての重複を返す他の回答を変更して、最初に見つかったものだけを返しますが、1つを選択する前にすべての重複を計算するため、パフォーマンスに本質的に影響はありません。

各ベンチマークの結果は、最速から最速までリストされています。

最初に、配列に100個の要素が含まれるとします。

benchmark(100, 0) 0 duplicates Running each test 64 times. Test will take about 2 seconds. Cary_set is similar to Cary_diff Cary_diff is similar to Ryan Ryan is similar to Sergio Sergio is faster than Chris by 4x ± 1.0 Chris is faster than Naveed by 2x ± 1.0 benchmark(100, 1) 1 duplicates Running each test 128 times. Test will take about 2 seconds. Cary_set is similar to Cary_diff Cary_diff is faster than Ryan by 2x ± 1.0 Ryan is similar to Sergio Sergio is faster than Chris by 2x ± 1.0 Chris is faster than Naveed by 2x ± 1.0 benchmark(100, 10) 10 duplicates Running each test 1024 times. Test will take about 3 seconds. Chris is faster than Naveed by 2x ± 1.0 Naveed is faster than Cary_diff by 2x ± 1.0 (results differ: AAC vs AAF) Cary_diff is similar to Cary_set Cary_set is faster than Sergio by 3x ± 1.0 (results differ: AAF vs AAC) Sergio is similar to Ryan

次に、10,000個の要素を持つ配列を考えます。

benchmark(10000, 0) 0 duplicates Running each test once. Test will take about 4 minutes. Ryan is similar to Sergio Sergio is similar to Cary_set Cary_set is similar to Cary_diff Cary_diff is faster than Chris by 400x ± 100.0 Chris is faster than Naveed by 3x ± 0.1 benchmark(10000, 1) 1 duplicates Running each test once. Test will take about 1 second. Cary_set is similar to Cary_diff Cary_diff is similar to Sergio Sergio is similar to Ryan Ryan is faster than Chris by 2x ± 1.0 Chris is faster than Naveed by 2x ± 1.0 benchmark(10000, 10) 10 duplicates Running each test once. Test will take about 11 seconds. Cary_set is similar to Cary_diff Cary_diff is faster than Sergio by 3x ± 1.0 (results differ: AAE vs AAA) Sergio is similar to Ryan Ryan is faster than Chris by 20x ± 10.0 Chris is faster than Naveed by 3x ± 1.0 benchmark(10000, 100) 100 duplicates Cary_set is similar to Cary_diff Cary_diff is faster than Sergio by 11x ± 10.0 (results differ: ADG vs ACL) Sergio is similar to Ryan Ryan is similar to Chris Chris is faster than Naveed by 3x ± 1.0

find_a_dup_using_difference(arr)は、Array#differenceがCで実装されている場合、Rubyコアに追加された場合にはるかに効率的であることに注意してください。

結論

答えの多くは合理的ですが、セットを使用することは明確な最良の選択です。それは中程度のハードケースで最速、最も難しいジョイントで最速であり、計算的に些細なケースでのみです-とにかくあなたの選択は重要ではありません-それは打ち負かすことができます。

Chrisのソリューションを選択できる非常に特殊なケースは、メソッドを使用して数千の小さな配列を個別に重複排除し、通常10アイテム未満の重複を見つけることを期待する場合です。これは少し高速ですセットを作成するための小さな追加オーバーヘッドを回避するためです。

akuhn · Answer

残念ながら、ほとんどの答えはO(n^2)です。

これがO(n)ソリューションです、

a = %w{the quick brown fox jumps over the lazy dog} h = Hash.new(0) a.find { |each| (h[each] += 1) == 2 } # => 'the"

これの複雑さは何ですか？

O(n)で実行され、最初の一致でブレークします
O(n)メモリを使用しますが、最小量のみ

現在、アレイ内の重複の頻度に応じて、これらのランタイムは実際にさらに改善される可能性があります。たとえば、サイズO(n)の配列がk << nの異なる要素の母集団からサンプリングされた場合、ランタイムとスペースの両方の複雑さだけがO(k)になりますが、元のポスターは入力を検証しており、重複がないことを確認したい。その場合は、ランタイムとメモリの複雑さO(n)の両方です。これは、入力の大部分で要素に繰り返しがないことが予想されるためです。

Martin Velez · Answer

Ruby Arrayオブジェクトには、selectという優れたメソッドがあります。

select {|item| block } → new_ary select → an_enumerator

最初のフォームは、ここであなたが興味を持っているものです。テストに合格したオブジェクトを選択できます。

Ruby Arrayオブジェクトには別のメソッドcountがあります。

count → int count(obj) → int count { |item| block } → int

この場合、重複（配列内に複数回現れるオブジェクト）に関心があります。適切なテストはa.count(obj) > 1です。

a = ["A", "B", "C", "B", "A"]の場合、

a.select{|item| a.count(item) > 1}.uniq => ["A", "B"]

one objectのみが必要であると述べています。だから一つを選んでください。

Rokibul Hasan · Answer

find_all（）は、arrayがenumではないblockのすべての要素を含むfalseを返します。

duplicate要素を取得するには

>> arr = ["A", "B", "C", "B", "A"] >> arr.find_all { |x| arr.count(x) > 1 } => ["A", "B", "B", "A"]

または、uniq要素を複製します

>> arr.find_all { |x| arr.count(x) > 1 }.uniq => ["A", "B"]

Sergio Tulentsev · Answer

このような何かが動作します

arr = ["A", "B", "C", "B", "A"] arr.inject(Hash.new(0)) { |h,e| h[e] += 1; h }. select { |k,v| v > 1 }. collect { |x| x.first }

つまり、すべての値をハッシュに入れます。ここで、キーは配列の要素で、値は出現回数です。次に、複数回出現するすべての要素を選択します。簡単です。

danielricecodes · Answer

このスレッドは具体的にはRubyについてのものですが、ActiveRecordを使用してRuby on Railsのコンテキスト内でこれを行う方法を探してここに着きました。私のソリューションも共有してください。

class ActiveRecordClass < ActiveRecord::Base #has two columns, a primary key (id) and an email_address (string) end ActiveRecordClass.group(:email_address).having("count(*) > 1").count.keys

上記は、この例のデータベーステーブル（Railsでは "active_record_classes"になります）で複製されたすべての電子メールアドレスの配列を返します。

benzhang · Answer

a = ["A", "B", "C", "B", "A"] a.each_with_object(Hash.new(0)) {|i,hash| hash[i] += 1}.select{|_, count| count > 1}.keys

これはO(n)プロシージャです。

または、次のいずれかの行を実行できます。また、O(n)が1回のみの反復

a.each_with_object(Hash.new(0).merge dup: []){|x,h| h[:dup] << x if (h[x] += 1) == 2}[:dup] a.inject(Hash.new(0).merge dup: []){|h,x| h[:dup] << x if (h[x] += 1) == 2;h}[:dup]

konung · Answer

これは、重複する部分を見つけるためのレガシーdBaseテーブルなど、大量のデータに関する私の見解です。

# Assuming ps is an array of 20000 part numbers & we want to find duplicates # actually had to it recently. # having a result hash with part number and number of times part is # duplicated is much more convenient in the real world application # Takes about 6 seconds to run on my data set # - not too bad for an export script handling 20000 parts h = {}; # or for readability h = {} # result hash ps.select{ |e| ct = ps.count(e) h[e] = ct if ct > 1 }; nil # so that the huge result of select doesn't print in the console

IAmNaN · Answer

（1つではなく）2つの異なる配列を比較する場合、非常に高速な方法は、 Rubyの配列クラスによって提供される交差演算子&を使用することです。

# Given a = ['a', 'b', 'c', 'd'] b = ['e', 'f', 'c', 'd'] # Then this... a & b # => ['c', 'd']

Tilo · Answer

each_with_objectはあなたの友達です！

input = [:bla,:blubb,:bleh,:bla,:bleh,:bla,:blubb,:brrr] # to get the counts of the elements in the array: > input.each_with_object({}){|x,h| h[x] ||= 0; h[x] += 1} => {:bla=>3, :blubb=>2, :bleh=>2, :brrr=>1} # to get only the counts of the non-unique elements in the array: > input.each_with_object({}){|x,h| h[x] ||= 0; h[x] += 1}.reject{|k,v| v < 2} => {:bla=>3, :blubb=>2, :bleh=>2}

Dorian · Answer

r = [1, 2, 3, 5, 1, 2, 3, 1, 2, 1] r.group_by(&:itself).map { |k, v| v.size > 1 ? [k] + [v.size] : nil }.compact.sort_by(&:last).map(&:first)

muneebahmad · Answer

重複の数とそれらが何であるかを調べる必要があったため、Naveedが以前に投稿したものから構築した関数を作成しました。

def print_duplicates(array) puts "Array count: #{array.count}" map = {} total_dups = 0 array.each do |v| map[v] = (map[v] || 0 ) + 1 end map.each do |k, v| if v != 1 puts "#{k} appears #{v} times" total_dups += 1 end end puts "Total items that are duplicated: #{total_dups}" end

Amrit Dhungana · Answer

a = ["A", "B", "C", "B", "A"] b = a.select {|e| a.count(e) > 1}.uniq c = a - b d = b + c

結果

 d => ["A", "B", "C"]