pythonでキューをいっぱいにしてマルチプロセッシングを管理する

Question

私はPythonでこの問題を抱えています：

時々確認する必要があるURLのキューがあります
キューがいっぱいになると、キュー内の各アイテムを処理する必要があります
キュー内の各アイテムは単一のプロセス（マルチプロセッシング）で処理する必要があります

これまでのところ、私はこれを「手動で」次のように達成することができました。

while 1: self.updateQueue() while not self.mainUrlQueue.empty(): domain = self.mainUrlQueue.get() # if we didn't launched any process yet, we need to do so if len(self.jobs) < maxprocess: self.startJob(domain) #time.sleep(1) else: # If we already have process started we need to clear the old process in our pool and start new ones jobdone = 0 # We circle through each of the process, until we find one free ; only then leave the loop while jobdone == 0: for p in self.jobs : #print "entering loop" # if the process finished if not p.is_alive() and jobdone == 0: #print str(p.pid) + " job dead, starting new one" self.jobs.remove(p) self.startJob(domain) jobdone = 1

しかし、それは多くの問題とエラーにつながります。プロセスのプールを使用する方が適しているのではないかと思いました。これを行う正しい方法は何でしょうか？

ただし、キューが空である場合が多く、1秒間に300のアイテムでいっぱいになる可能性があるため、ここでの操作方法がよくわかりません。

Sylvain Leroux · Answer

queue のブロッキング機能を使用して、起動時に複数のプロセスを生成し（ multiprocessing.Pool を使用）、一部のデータが利用可能になるまでスリープさせることができます。処理するキュー。これに慣れていない場合は、その単純なプログラムで「遊ぶ」ことができます。

import multiprocessing import os import time the_queue = multiprocessing.Queue() def worker_main(queue): print os.getpid(),"working" while True: item = queue.get(True) print os.getpid(), "got", item time.sleep(1) # simulate a "long" operation the_pool = multiprocessing.Pool(3, worker_main,(the_queue,)) # don't forget the coma here ^ for i in range(5): the_queue.put("hello") the_queue.put("world") time.sleep(10)

^{Python 2.7.3 on Linuxでテスト済み}

これにより、3つのプロセス（親プロセスに加えて）が生成されます。各子はworker_main関数を実行します。これは、反復ごとにキューから新しいアイテムを取得する単純なループです。処理する準備ができていない場合、ワーカーはブロックします。

起動時に、キューにデータが送られるまで、3つのプロセスすべてがスリープ状態になります。データが利用可能になると、待機中のワーカーの1人がそのアイテムを取得して、処理を開始します。その後、キューから他のアイテムを取得しようとし、何も利用できない場合は再び待機します...