Pure-Ruby同時ハッシュ

Question

複数のスレッド間で変更できるが、ロックの数が最も少ないハッシュを実装するための最良の方法は何ですか。この質問の目的上、ハッシュは読み取りが多いと想定できます。これは、JRubyのように真に同時動作するものを含め、すべてのRuby実装でスレッドセーフである必要があり、純粋なRuby（CまたはJavaなし）で記述されている必要があります。 _許可）。

常にロックするナイーブなソリューションを自由に提出してください。ただし、それが最善のソリューションであるとは限りません。優雅さのポイントですが、ロックの可能性が低いほど、小さなコードよりも優先されます。

ara t howard · Accepted Answer

さて、「スレッドセーフ」の実際の意味を指定したので、ここに2つの潜在的な実装があります。次のコードは、MRIとJRubyで永久に実行されます。ロックレス実装は、マスターが流動的である場合、各スレッドが独自のハッシュビューを使用する結果整合性モデルに従います。スレッドにすべての情報を格納することでメモリがリークしないようにするために必要なちょっとしたトリックがありますが、それは処理およびテストされます。このコードを実行してもプロセスサイズは大きくなりません。どちらの実装も「完全」にするためにさらに作業が必要です。つまり、削除、更新などにはある程度の考慮が必要ですが、以下の2つの概念のいずれかが要件を満たします。

このスレッドを読んでいる人にとって、問題全体がJRubyに限定されていることを理解することは非常に重要です。MRIでは、組み込みのハッシュで十分です。

module Cash def Cash.new(*args, &block) env = ENV['CASH_IMPL'] impl = env ? Cash.const_get(env) : LocklessImpl klass = defined?(JRUBY_VERSION) ? impl : ::Hash klass.new(*args) end class LocklessImpl def initialize @hash = {} end def thread_hash thread = Thread.current thread[:cash] ||= {} hash = thread[:cash][thread_key] if hash hash else hash = thread[:cash][thread_key] = {} ObjectSpace.define_finalizer(self){ thread[:cash].delete(thread_key) } hash end end def thread_key [Thread.current.object_id, object_id] end def []=(key, val) time = Time.now.to_f Tuple = [time, val] @hash[key] = Tuple thread_hash[key] = Tuple val end def [](key) # check the master value # val = @hash[key] # someone else is either writing the key or it has never been set. we # need to invalidate our own copy in either case # if val.nil? thread_val = thread_hash.delete(key) return(thread_val ? thread_val.last : nil) end # check our own thread local value # thread_val = thread_hash[key] # in this case someone else has written a value that we have never seen so # simply return it # if thread_val.nil? return(val.last) end # in this case there is a master *and* a thread local value, if the master # is newer juke our own cached copy # if val.first > thread_val.first thread_hash.delete(key) return val.last else return thread_val.last end end end class LockingImpl < ::Hash require 'sync' def initialize(*args, &block) super ensure extend Sync_m end def sync(*args, &block) sync_synchronize(*args, &block) end def [](key) sync(:SH){ super } end def []=(key, val) sync(:EX){ super } end end end if $0 == __FILE__ iteration = 0 loop do n = 42 hash = Cash.new threads = Array.new(10) { Thread.new do Thread.current.abort_on_exception = true n.times do |key| hash[key] = key raise "#{ key }=nil" if hash[key].nil? end end } threads.map{|thread| thread.join} puts "THREADSAFE: #{ iteration += 1 }" end end

user43955 · Answer

Stack Overflowの信用を高めるために、ベース/ナイーブソリューションを投稿します。

require 'thread' class ConcurrentHash < Hash def initialize super @mutex = Mutex.new end def [](*args) @mutex.synchronize { super } end def []=(*args) @mutex.synchronize { super } end end

user43955 · Answer

ええだ、ivarの設定はアトミックだとおっしゃっていたと思いますか？では、単純なコピーとスワップはどうですか？

require 'thread' class ConcurrentHash def initialize @reader, @writer = {}, {} @lock = Mutex.new end def [](key) @reader[key] end def []=(key, value) @lock.synchronize { @writer[key] = value @reader, @writer = @writer, @reader @writer[key] = value } end end

Michael Sofaer · Answer

これはHashのラッパークラスであり、同時リーダーを許可しますが、他のすべてのタイプのアクセス（反復読み取りを含む）に対して物事をロックダウンします。

class LockedHash def initialize @hash = Hash.new @lock = ThreadAwareLock.new() @reader_count = 0 end def [](key) @lock.lock_read ret = @hash[key] @lock.unlock_read ret end def []=(key, value) @lock.lock_write @hash[key] = value @lock.unlock_write end def method_missing(method_sym, *arguments, &block) if @hash.respond_to? method_sym @lock.lock_block val = lambda{@hash.send(method_sym,*arguments, &block)}.call @lock.unlock_block return val end super end end

使用するロックコードは次のとおりです。

class RWLock def initialize @outer = Mutex.new @inner = Mutex.new @reader_count = 0 end def lock_read @outer.synchronize{@inner.synchronize{@reader_count += 1}} end def unlock_read @inner.synchronize{@reader_count -= 1} end def lock_write @outer.lock while @reader_count > 0 ;end end def unlock_write @outer.unlock end end class ThreadAwareLock < RWLock def initialize @owner = nil super end def lock_block lock_write @owner = Thread.current.object_id end def unlock_block @owner = nil unlock_write end def lock_read super unless my_block? end def unlock_read super unless my_block? end def lock_write super unless my_block? end def unlock_write super unless my_block? end def my_block? @owner == Thread.current.object_id end end

スレッド対応ロックでは、クラスを1回ロックしてから、通常はロックするメソッドを呼び出して、ロックしないようにします。これが必要なのは、一部のメソッド内でブロックに譲り、それらのブロックがオブジェクトのロックメソッドを呼び出すことができ、デッドロックやダブルロックエラーが発生しないようにするためです。これの代わりにカウントロックを使用できます。

バケットレベルの読み取り/書き込みロックを実装する試みは次のとおりです。

class SafeBucket def initialize @lock = RWLock.new() @value_pairs = [] end def get(key) @lock.lock_read pair = @value_pairs.select{|p| p[0] == key} unless pair && pair.size > 0 @lock.unlock_read return nil end ret = pair[0][1] @lock.unlock_read ret end def set(key, value) @lock.lock_write pair = @value_pairs.select{|p| p[0] == key} if pair && pair.size > 0 pair[0][1] = value @lock.unlock_write return end @value_pairs.Push [key, value] @lock.unlock_write value end def each @value_pairs.each{|p| yield p[0],p[1]} end end class MikeConcurrentHash def initialize @buckets = [] 100.times {@buckets.Push SafeBucket.new} end def [](key) bucket(key).get(key) end def []=(key, value) bucket(key).set(key, value) end def each @buckets.each{|b| b.each{|key, value| yield key, value}} end def bucket(key) @buckets[key.hash % 100] end end

遅すぎるため、これに取り組むのをやめました。そのため、各メソッドは安全ではなく（反復中に他のスレッドによる変更を許可します）、ほとんどのハッシュメソッドをサポートしていません。

そして、これが同時ハッシュのテストハーネスです：

require 'thread' class HashHarness Keys = [:a, :basic, :test, :harness, :for, :concurrent, :testing, :of, :hashes, :that, :tries, :to, :provide, :a, :framework, :for, :designing, :a, :good, :ConcurrentHash, :for, :all, :Ruby, :implementations] def self.go h = new r = h.writiness_range(20, 10000, 0, 0) r.each{|k, v| p k + ' ' + v.map{|p| p[1]}.join(' ')} return end def initialize(classes = [MikeConcurrentHash, JoshConcurrentHash, JoshConcurrentHash2, PaulConcurrentHash, LockedHash, Hash]) @classes = classes end def writiness_range(basic_threads, ops, each_threads, loops) result = {} @classes.each do |hash_class| res = [] 0.upto 10 do |i| writiness = i.to_f / 10 res.Push [writiness,test_one(hash_class, basic_threads, ops, each_threads, loops, writiness)] end result[hash_class.name] = res end result end def test_one(hash_class, basic_threads, ops, each_threads, loops, writiness) time = Time.now threads = [] hash = hash_class.new populate_hash(hash) begin basic_threads.times do threads.Push Thread.new{run_basic_test(hash, writiness, ops)} end each_threads.times do threads.Push Thread.new{run_each_test(hash, writiness, loops)} end threads.each{|t| t.join} rescue ThreadError => e p [e.message, hash_class.name, basic_threads, ops, each_threads, loops, writiness].join(' ') return -1 end p [hash_class.name, basic_threads, ops, each_threads, loops, writiness, Time.now - time].join(' ') return Time.now - time end def run_basic_test(hash, writiness, ops) ops.times do Rand < writiness ? hash[choose_key]= Rand : hash[choose_key] end end def run_each_test(hash, writiness, loops) loops.times do hash.each do |k, v| if Rand < writiness each_write_work(hash, k, v) else each_read_work(k, v) end end end end def each_write_work(hash, key, value) hash[key] = Rand end def each_read_work(key, value) key.to_s + ": " + value.to_s end def choose_key Keys[Rand(Keys.size)] end def populate_hash(hash) Keys.each{|key| hash[key]=Rand} end end

番号：Jruby

Writiness 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 ConcurrentHash 2.098 3.179 2.971 3.083 2.731 2.941 2.564 2.480 2.369 1.862 1.881 LockedHash 1.873 1.896 2.085 2.058 2.001 2.055 1.904 1.921 1.873 1.841 1.630 Hash 0.530 0.672 0.685 0.822 0.719 0.877 0.901 0.931 0.942 0.950 1.001

そしてMRI

Writiness 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 ConcurrentHash 9.214 9.913 9.064 10.112 10.240 10.574 10.566 11.027 11.323 11.837 13.036 LockedHash 19.593 17.712 16.998 17.045 16.687 16.609 16.647 15.307 14.464 13.931 14.146 Hash 0.535 0.537 0.534 0.599 0.594 0.676 0.635 0.650 0.654 0.661 0.692

MRIの数値はかなり印象的です。 MRIを固定するのは本当に大変です。

ms-tg · Answer

これはハムスターの宝石のユースケースかもしれません

Hamsterは、純粋なRubyで Hash Array Mapped Tries（HAMT）、およびその他の永続データ構造を実装します。

永続データ構造は不変であり、ハッシュ内のキーと値のペアを追加または置換するなどして構造を変更（変更）する代わりに、変更を含む新しいデータ構造を返します。永続的な不変のデータ構造を使用する秘訣は、新しく返されたデータ構造が可能な限り多くの先行データを再利用することです。

ハムスターを使用して実装するには、ミューテックスですべての書き込みを保護し、新しい値にスワップしながら、すべての読み取りを永続的な不変ハッシュの現在の値に渡す（つまり、高速である必要がある）可変ハッシュラッパーを使用すると思います書き込み後の永続的な不変ハッシュの。

例えば：

require 'hamster' require 'hamster/experimental/mutable_hash' hsh = Hamster.mutable_hash(:name => "Simon", :gender => :male) # reading goes directly to hash puts hsh[:name] # Simon # writing is actually swapping to new value of underlying persistent data structure hsh.put(:name, "Joe") puts hsh[:name] # Joe

それで、これを説明されているものと同様のタイプの問題に使用しましょう：

（要点はこちら）

require 'hamster' require 'hamster/experimental/mutable_hash' # a bunch of threads with a read/write ratio of 10:1 num_threads = 100 num_reads_per_write = 10 num_loops = 100 hsh = Hamster.mutable_hash puts Ruby_DESCRIPTION puts "#{num_threads} threads x #{num_loops} loops, #{num_reads_per_write}:1 R/W ratio" t0 = Time.now Thread.abort_on_exception = true threads = (0...num_threads).map do |n| Thread.new do write_key = n % num_reads_per_write read_keys = (0...num_reads_per_write).to_a.shuffle # random order last_read = nil num_loops.times do read_keys.each do |k| # Reads last_read = hsh[k] Thread.pass # Atomic increments in the correct ratio to reads hsh.put(k) { |v| (v || 0) + 1 } if k == write_key end end end end threads.map { |t| t.join } t1 = Time.now puts "Error in keys" unless (0...num_reads_per_write).to_a == hsh.keys.sort.to_a puts "Error in values" unless hsh.values.all? { |v| v == (num_loops * num_threads) / num_reads_per_write } puts "Time elapsed: #{t1 - t0} s"

次の出力が得られます。

Ruby 1.9.2p320 (2012-04-20 revision 35421) [x86_64-linux] 100 threads x 100 loops, 10:1 R/W ratio Time elapsed: 5.763414627 s jruby 1.7.0 (1.9.3p203) 2012-10-22 ff1ebbe on Java HotSpot(TM) 64-Bit Server VM 1.6.0_26-b03 [linux-AMD64] 100 threads x 100 loops, 10:1 R/W ratio Time elapsed: 1.697 s

これについてどう思いますか？

このソリューションは、ScalaまたはClojureでこれを解決する方法に似ていますが、これらの言語では、アトミックコンペアアンドスワップに低レベルのCPUをサポートするソフトウェアトランザクショナルメモリを使用する可能性が高くなります。実装される操作。

編集：ハムスターの実装が高速である理由の1つは、ロックフリーの読み取りパスを備えていることです。それやその仕組みについて質問がある場合は、コメントで返信してください。

ara t howard · Answer

これが何を意味するのかよくわかりません。最も単純な実装は単純だと思います

Hash

つまり、組み込みのRuby hash is threadsafe if threadsafeは、1つ以上のスレッドがそれにアクセスしようとしても爆発しないことを意味します。このコードは実行されます安全に永遠に

n = 4242 hash = {} loop do a = Thread.new do n.times do hash[:key] = :val end end b = Thread.new do n.times do hash.delete(:key) end end c = Thread.new do n.times do val = hash[:key] raise val.inspect unless [nil, :val].include?(val) end end a.join b.join c.join p :THREADSAFE end

スレッドセーフでは、実際にはACIDを意味していると思います。たとえば、hash [：key] =：valのような書き込みの後に、has [：key]が：valを返す場合は読み取りを行います。しかし、ロックを使ったトリックの量はそれを提供することはできません-最後のインは常に勝ちます。たとえば、42のスレッドがすべてスレッドセーフハッシュを更新しているとしましょう-どの値を43番目までに読み取る必要がありますか？確かにthreasafeとは、書き込みのある種の全順序付けを意味するわけではありません。したがって、42のスレッドがアクティブに「正しい」値を書き込んでいた場合、anyでしょ？しかし、Rubyの組み込みハッシュはこのように機能します...

多分あなたは次のようなものを意味します

hash.each do ...

1つのスレッドで

hash.delete(key)

お互いに干渉しませんか？私はそれをスレッドセーフにしたいと想像することができますが、それはMRI Rubyのsingleスレッドでは安全ではありません（明らかに、ハッシュを繰り返しながら変更することはできません）

それで、あなたは「スレッドセーフ」が何を意味するかについてより具体的にすることができますか？

aCIDセマンティクスを提供する唯一の方法は、グロスロックです（これは、ブロックを取得したメソッドである可能性がありますが、それでも外部ロックです）。

Rubyのスレッドスケジューラは、任意のc関数（組み込みのハッシュaref asetメソッドなど）の途中でスレッドスマックをスケジュールするだけではないため、これらは事実上スレッドセーフです。

Paul · Answer

テストされておらず、読み取りの最適化に素朴な刺し傷があります。ほとんどの場合、値はロックされないことを前提としています。そうである場合、タイトループはそうなるまで試行します。置いた Thread.criticalそこには、書き込みが完了するまで読み取りスレッドが実行されないようにするためのものがあります。重要な部分が必要かどうかはわかりませんが、実際には読み取りがどれだけ重いかによって異なります。そのため、いくつかのベンチマークが適切です。

class ConcurrentHash < Hash def initialize(*args) @semaphore = Mutex.new super end def []=(k,v) begin old_crit = Thread.critical Thread.critical = true unless old_crit @semaphore.synchronize { super } ensure Thread.critical = old_crit end end def [](k) while(true) return super unless @semaphore.locked? end end end

@semaphoreロックをチェックする必要がある他のいくつかの読み取りメソッドがあるかもしれません。他のすべてが＃[]に関して実装されているかどうかはわかりません。

Javier · Answer

これ（ video 、 pdf ）は、Javaで実装されたロックフリーハッシュテーブルに関するものです。

スポイラー：アトミックコンペアアンドスワップ（CAS）操作を使用します。Rubyで使用できない場合は、ロックでエミュレートできます。それが利点になるかどうかはわかりません。単純なロックガードされたハッシュテーブル上

Dmitry Shevkoplyas · Answer

残念ながら、Michael Sofaerが紹介する回答にコメントを追加することはできません：クラスRWLockおよびクラスLockedHashと@reader_countなど（まだ十分なカルマがありません）

その解決策は機能しません。エラーが発生します： `unlock 'の場合：ロックされていないミューテックスのロックを解除しようとしました（ThreadError）

論理的なバグのため：ロックを解除する時間になると、ロック解除が1回余分に発生します（チェックmy_block？（）がないため）。代わりに、ブロック解除が不要な場合でもブロックが解除されます。ロック解除されたミュートは例外を発生させます。（この投稿の最後に、このエラーを再現する方法に関する完全なコードを貼り付けます）。

また、Michaelは、「各メソッドは安全ではない（反復中に他のスレッドによるミューテーションを許可する）」と述べました。これは私にとって重要でした。そのため、すべてのユースケースで機能するこの単純化されたソリューションになり、への呼び出しでミューテックスをロックするだけです。異なるスレッドから呼び出された場合のハッシュメソッド（ロックを所有する同じスレッドからの呼び出しは、デッドロックを回避するためにブロックされません）：

# # This TrulyThreadSafeHash works! # # Note if one thread iterating the hash by #each method # then the hash will be locked for all other threads (they will not be # able to even read from it) # class TrulyThreadSafeHash def initialize @mutex = Mutex.new @hash = Hash.new end def method_missing(method_sym, *arguments, &block) if !@mutex.owned? # Returns true if this lock is currently held by current thread # We're trying to lock only if mutex is not owned by the current thread (is not locked or is locked by some other thread). # Following call will be blocking if mutex locked by other thread: @mutex.synchronize{ return lambda{@hash.send(method_sym,*arguments, &block)}.call } end # We already own the lock (from current thread perspective). # We don't even check if @hash.respond_to?(method_sym), let's make Hash # respond properly on all calls (including bad calls (example: wrong method names)) lambda{@hash.send(method_sym,*arguments, &block)}.call end # since we're tyring to mimic Hash we'll pretend to respond as Hash would def self.respond_to?(method_sym, include_private = false) Hash.respond_to(method_sym, include_private) end # override Object's to_s because our method_missing won't be called for to_s def to_s(*arguments) @mutex.synchronize{ return @hash.to_s } end # And for those, who want to run extra mile: # to make our class json-friendly we shoud require 'json' and uncomment this: #def to_json(*options) # @mutex.synchronize{ # return @hash.to_json(*options) # } #end end

そして今、MichaelSofaerのソリューションでダブルロック解除のエラーを実証/再現するための完全な例：

#!/usr/bin/env Ruby # ======= unchanged copy-paste part from Michael Sofaer answer (begin) ======= class LockedHash def initialize @hash = Hash.new @lock = ThreadAwareLock.new() @reader_count = 0 end def [](key) @lock.lock_read ret = @hash[key] @lock.unlock_read ret end def []=(key, value) @lock.lock_write @hash[key] = value @lock.unlock_write end def method_missing(method_sym, *arguments, &block) if @hash.respond_to? method_sym @lock.lock_block val = lambda{@hash.send(method_sym,*arguments, &block)}.call @lock.unlock_block return val end super end end class RWLock def initialize @outer = Mutex.new @inner = Mutex.new @reader_count = 0 end def lock_read @outer.synchronize{@inner.synchronize{@reader_count += 1}} end def unlock_read @inner.synchronize{@reader_count -= 1} end def lock_write @outer.lock while @reader_count > 0 ;end end def unlock_write @outer.unlock end end class ThreadAwareLock < RWLock def initialize @owner = nil super end def lock_block lock_write @owner = Thread.current.object_id end def unlock_block @owner = nil unlock_write end def lock_read super unless my_block? end def unlock_read super unless my_block? end def lock_write super unless my_block? end def unlock_write super unless my_block? end def my_block? @owner == Thread.current.object_id end end # ======= unchanged copy-paste part from Michael Sofaer answer (end) ======= # global hash object, which will be 'shared' across threads $h = LockedHash.new # hash_reader is just iterating through the 'shared' hash $h # and prints specified delimeter (capitalized when last hash item read) def hash_reader(delim) loop{ count = 0 $h.each{ count += 1 if count != $h.size $stderr.print delim else $stderr.puts delim.upcase end } } end # fill hash with 10 items 10.times{|i| $h[i] = i } # create a thread which will read $h hash t1 = Thread.new(){ hash_reader("o") } t1.join # will never happen, but for completeness

、次のエラーが発生します。

./LockedHash_fails_to_unlock.rb oooooooooO ./LockedHash_fails_to_unlock.rb:55:in `unlock': Attempt to unlock a mutex which is not locked (ThreadError) from ./LockedHash_fails_to_unlock.rb:55:in `unlock_write' from ./LockedHash_fails_to_unlock.rb:82:in `unlock_write' from ./LockedHash_fails_to_unlock.rb:70:in `unlock_block' from ./LockedHash_fails_to_unlock.rb:29:in `method_missing' from ./LockedHash_fails_to_unlock.rb:100:in `block in hash_reader' from ./LockedHash_fails_to_unlock.rb:98:in `loop' from ./LockedHash_fails_to_unlock.rb:98:in `hash_reader' from ./LockedHash_fails_to_unlock.rb:119:in `block in <main>'