pythonでの並列処理

Question

python 2.7で並列処理を行う単純なコードとは何ですか？オンラインで見つけたすべての例は複雑で、不要なコードが含まれています。

各コアで整数を1因数分解できる単純なブルートフォース整数因数分解プログラムをどのように実行しますか（4）？私の実際のプログラムはおそらく2つのコアのみを必要とし、情報を共有する必要があります。

Parallel-pythonや他のライブラリが存在することはわかっていますが、使用するライブラリの数を最小限に抑えたいので、threadおよび/またはmultiprocessingライブラリを使用したいので、 Pythonが付属しています

Jonathan Dursi · Accepted Answer

pythonで並列処理を開始するための優れた簡単な方法は、mutiprocessingでのプールマッピングです-通常のpythonマップのようですが、個々の関数呼び出しが広がっています異なる数のプロセスにわたって.

因数分解はこの良い例です。利用可能なすべてのタスクに広がるすべての部門を総当たりチェックできます。

from multiprocessing import Pool import numpy numToFactor = 976 def isFactor(x): result = None div = (numToFactor / x) if div*x == numToFactor: result = (x,div) return result if __name__ == '__main__': pool = Pool(processes=4) possibleFactors = range(1,int(numpy.floor(numpy.sqrt(numToFactor)))+1) print 'Checking ', possibleFactors result = pool.map(isFactor, possibleFactors) cleaned = [x for x in result if not x is None] print 'Factors are', cleaned

これは私に与えます

Checking [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31] Factors are [(1, 976), (2, 488), (4, 244), (8, 122), (16, 61)]

Tim McNamara · Answer

mincemeat は、私が見つけた最も単純なmap/reduce実装です。また、依存関係が非常に軽くなります。単一のファイルであり、標準ライブラリを使用してすべてを実行します。

Mike McKerns · Answer

Poolのmultiprocessingを使用することは、標準ライブラリ内にとどまりたい場合、おそらく最良のルートであることに同意します。他のタイプの並列処理に関心があるが、新しいことは何も学んでいない場合（つまり、multiprocessingと同じインターフェイスを使用している場合）は、pathosを試すことができます。マップし、multiprocessingとほぼ同じインターフェイスを持っています。

Python 2.7.6 (default, Nov 12 2013, 13:26:39) [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> numToFactor = 976 >>> def isFactor(x): ... result = None ... div = (numToFactor / x) ... if div*x == numToFactor: ... result = (x,div) ... return result ... >>> from pathos.multiprocessing import ProcessingPool as MPool >>> p = MPool(4) >>> possible = range(1,int(numpy.floor(numpy.sqrt(numToFactor)))+1) >>> # standard blocking map >>> result = [x for x in p.map(isFactor, possible) if x is not None] >>> print result [(1, 976), (2, 488), (4, 244), (8, 122), (16, 61)] >>> >>> # asynchronous map (there's also iterative maps too) >>> obj = p.amap(isFactor, possible) >>> obj <processing.pool.MapResult object at 0x108efc450> >>> print [x for x in obj.get() if x is not None] [(1, 976), (2, 488), (4, 244), (8, 122), (16, 61)] >>> >>> # there's also parallel-python maps (blocking, iterative, and async) >>> from pathos.pp import ParallelPythonPool as PPool >>> q = PPool(4) >>> result = [x for x in q.map(isFactor, possible) if x is not None] >>> print result [(1, 976), (2, 488), (4, 244), (8, 122), (16, 61)]

また、pathosにはpyinaと同じインターフェイスを持つ姉妹パッケージがあり、mpi4pyを実行しますが、MPIで実行される並列マップを提供します。複数のスケジューラを使用して実行できます。

もう1つの利点は、pathosに標準のpythonで得られるよりも優れたシリアライザーが付属しているため、multiprocessingよりもさまざまな関数のシリアル化などの機能がはるかに優れていることです。そして、あなたは通訳からすべてを行うことができます。

>>> class Foo(object): ... b = 1 ... def factory(self, a): ... def _square(x): ... return a*x**2 + self.b ... return _square ... >>> f = Foo() >>> f.b = 100 >>> g = f.factory(-1) >>> p.map(g, range(10)) [100, 99, 96, 91, 84, 75, 64, 51, 36, 19] >>>

ここでコードを取得： https://github.com/uqfoundation

Ion Stoica · Answer

これは、エレガントに Ray を使用して行うことができます。これは、Pythonコードを簡単に並列化して配布できるシステムです。

サンプルを並列化するには、@ray.remoteデコレータを使用してマップ関数を定義し、.remoteを使用してそれを呼び出す必要があります。これにより、リモート関数のすべてのインスタンスが異なるプロセスで実行されることが保証されます。

import ray ray.init() # Define the function to compute the factors of a number as a remote function. # This will make sure that a call to this function will run it in a different # process. @ray.remote def compute_factors(x): factors = [] for i in range(1, x + 1): if x % i == 0: factors.append(i) return factors # List of inputs. inputs = [67, 24, 18, 312] # Call a copy of compute_factors() on each element in inputs. # Each copy will be executed in a separate process. # Note that a remote function returns a future, i.e., an # identifier of the result, rather that the result itself. # This enables the calls to remote function to not be blocking, # which enables us to call many remote function in parallel. result_ids = [compute_factors.remote(x) for x in inputs] # Now get the results results = ray.get(result_ids) # Print the results. for i in range(len(inputs)): print("The factors of", inputs[i], "are", results[i])

multiprocessing モジュールよりもRayを使用することには多くの利点があります。特に、同じコードは、単一のマシンとマシンのクラスターで実行されます。 Rayのその他の利点については、この関連記事を参照してください。