random.choiceの加重バージョン

Question

Random.choiceの重み付けバージョンを作成する必要がありました（リスト内の各要素は、選択される確率が異なります）。これは私が思いついたものです：

def weightedChoice(choices): """Like random.choice, but each element can have a different chance of being selected. choices can be any iterable containing iterables with two items each. Technically, they can have more than two items, the rest will just be ignored. The first item is the thing being chosen, the second item is its weight. The weights can be any numeric values, what matters is the relative differences between them. """ space = {} current = 0 for choice, weight in choices: if weight > 0: space[current] = choice current += weight Rand = random.uniform(0, current) for key in sorted(space.keys() + [current]): if Rand < key: return choice choice = space[key] return None

この機能は、私には過度に複雑でseemsいようです。私はここの誰もがそれを改善するためのいくつかの提案またはこれを行う別の方法を提供できることを望んでいます。効率は、コードの清潔さと読みやすさほど重要ではありません。

Ronan Paix&#227;o · Accepted Answer

バージョン1.7.0以降、NumPyには、確率分布をサポートする choice 関数があります。

from numpy.random import choice draw = choice(list_of_candidates, number_of_items_to_pick, p=probability_distribution)

probability_distributionは、list_of_candidatesと同じ順序のシーケンスであることに注意してください。キーワードreplace=Falseを使用して、描画されたアイテムが置き換えられないように動作を変更することもできます。

Ned Batchelder · Answer

def weighted_choice(choices): total = sum(w for c, w in choices) r = random.uniform(0, total) upto = 0 for c, w in choices: if upto + w >= r: return c upto += w assert False, "Shouldn't get here"

vishes_shell · Answer

Python3.6以降、 choices モジュールからのメソッド random があります。

Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04) Type 'copyright', 'credits' or 'license' for more information IPython 6.0.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import random In [2]: random.choices( ...: population=[['a','b'], ['b','a'], ['c','b']], ...: weights=[0.2, 0.2, 0.6], ...: k=10 ...: ) Out[2]: [['c', 'b'], ['c', 'b'], ['b', 'a'], ['c', 'b'], ['c', 'b'], ['b', 'a'], ['c', 'b'], ['b', 'a'], ['c', 'b'], ['c', 'b']]

また、人々は、重みをサポートする numpy.random.choice があると述べましたが、が、はサポートしません 2d配列など。

そう、 ~~基本的に好きなものを手に入れることができます~~ （3.6.x Python がある場合は、組み込みrandom.choicesを使用してupdateを参照）。

UPDATE： @ roganjosh が親切に述べたように、random.choicesは、 docs ：

母集団から選択された要素のkサイズのリストを置換で返します。

そして、 @ronan-paixão の素晴らしい回答は、 numpy.choice がそのような振る舞いを制御するreplace引数を持っていると述べています。

Raymond Hettinger · Answer

重みを累積分布に配置します。
random.random（）を使用して、ランダムなフロート0.0 <= x < totalを選択します。
http://docs.python.org/dev/library/bisectの例に示すように、bisect.bisectを使用してディストリビューションを検索します。 .html＃other-examples 。

from random import random from bisect import bisect def weighted_choice(choices): values, weights = Zip(*choices) total = 0 cum_weights = [] for w in weights: total += w cum_weights.append(total) x = random() * total i = bisect(cum_weights, x) return values[i] >>> weighted_choice([("WHITE",90), ("RED",8), ("GREEN",2)]) 'WHITE'

複数の選択が必要な場合は、これを2つの関数に分割します。1つは累積重みを作成し、もう1つはランダムポイントに二等分します。

pweitzman · Answer

Numpyの使用を気にしない場合は、 numpy.random.choice を使用できます。

例えば：

import numpy items = [["item1", 0.2], ["item2", 0.3], ["item3", 0.45], ["item4", 0.05] elems = [i[0] for i in items] probs = [i[1] for i in items] trials = 1000 results = [0] * len(items) for i in range(trials): res = numpy.random.choice(items, p=probs) #This is where the item is selected! results[items.index(res)] += 1 results = [r / float(trials) for r in results] print "item	expected	actual" for i in range(len(probs)): print "%s	%0.4f	%0.4f" % (items[i], probs[i], results[i])

事前に必要な選択の数がわかっている場合は、次のようなループなしでそれを行うことができます。

numpy.random.choice(items, trials, p=probs)

PaulMcG · Answer

粗いですが、十分かもしれません：

import random weighted_choice = lambda s : random.choice(sum(([v]*wt for v,wt in s),[]))

動作しますか？

# define choices and relative weights choices = [("WHITE",90), ("RED",8), ("GREEN",2)] # initialize tally dict tally = dict.fromkeys(choices, 0) # tally up 1000 weighted choices for i in xrange(1000): tally[weighted_choice(choices)] += 1 print tally.items()

プリント：

[('WHITE', 904), ('GREEN', 22), ('RED', 74)]

すべての重みが整数であると仮定します。合計を100にする必要はありません。テスト結果を解釈しやすくするために、それを行っただけです。（重みが浮動小数点数である場合、すべての重みが1以上になるまで、それらすべてに10を繰り返し乗算します。）

weights = [.6, .2, .001, .199] while any(w < 1.0 for w in weights): weights = [w*10 for w in weights] weights = map(int, weights)

Maxime · Answer

リストの代わりに重み付き辞書がある場合、これを書くことができます

items = { "a": 10, "b": 5, "c": 1 } random.choice([k for k in items for dummy in range(items[k])])

[k for k in items for dummy in range(items[k])]がこのリストを生成することに注意してください['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'c', 'b', 'b', 'b', 'b', 'b']

Nickil Maveli · Answer

Python v3.6の時点で、 random.choices を使用して、オプションの重みで指定された母集団から指定されたサイズの要素のlistを返すことができます。

random.choices(population, weights=None, *, cum_weights=None, k=1)

population：一意の観測値を含むlist （空の場合、IndexErrorを発生させます）
weights：選択を行うために必要な、より正確な相対的な重み。
cum_weights：選択に必要な累積重み。
k：出力されるlenのサイズ（list）。（デフォルトlen()=1）

いくつかの警告：

1）描画されたアイテムが後で置き換えられるように、置換を伴う加重サンプリングを使用します。重みシーケンスの値自体は重要ではありませんが、相対的な比率は重要です。

確率を重みとしてのみ取ることができるnp.random.choiceとは異なり、1つの基準までの個々の確率の合計を保証する必要がありますが、ここにはそのような規制はありません。数値型（Decimal型を除くint/float/fraction）に属する限り、これらは引き続き実行されます。

>>> import random # weights being integers >>> random.choices(["white", "green", "red"], [12, 12, 4], k=10) ['green', 'red', 'green', 'white', 'white', 'white', 'green', 'white', 'red', 'white'] # weights being floats >>> random.choices(["white", "green", "red"], [.12, .12, .04], k=10) ['white', 'white', 'green', 'green', 'red', 'red', 'white', 'green', 'white', 'green'] # weights being fractions >>> random.choices(["white", "green", "red"], [12/100, 12/100, 4/100], k=10) ['green', 'green', 'white', 'red', 'green', 'red', 'white', 'green', 'green', 'green']

2）weightsもcum_weightsも指定されていない場合、等しい確率で選択が行われます。 weightsシーケンスが指定されている場合、populationシーケンスと同じ長さでなければなりません。

weightsとcum_weightsの両方を指定すると、TypeErrorが発生します。

>>> random.choices(["white", "green", "red"], k=10) ['white', 'white', 'green', 'red', 'red', 'red', 'white', 'white', 'white', 'green']

3）cum_weightsは通常、 itertools.accumulate 関数の結果であり、このような状況では本当に便利です。

_{リンクされているドキュメントから：}

内部的には、相対的な重みは選択を行う前に累積的な重みに変換されるため、累積的な重みを指定すると作業が節約されます。

したがって、私たちの不自然な場合にweights=[12, 12, 4]またはcum_weights=[12, 24, 28]を提供すると同じ結果が得られ、後者の方がより高速/効率的であるようです。

Raymond Hettinger · Answer

Python 3.6の標準ライブラリに含まれているバージョンは次のとおりです。

import itertools as _itertools import bisect as _bisect class Random36(random.Random): "Show the code included in the Python 3.6 version of the Random class" def choices(self, population, weights=None, *, cum_weights=None, k=1): """Return a k sized list of population elements chosen with replacement. If the relative weights or cumulative weights are not specified, the selections are made with equal probability. """ random = self.random if cum_weights is None: if weights is None: _int = int total = len(population) return [population[_int(random() * total)] for i in range(k)] cum_weights = list(_itertools.accumulate(weights)) Elif weights is not None: raise TypeError('Cannot specify both weights and cumulative weights') if len(cum_weights) != len(population): raise ValueError('The number of weights does not match the population') bisect = _bisect.bisect total = cum_weights[-1] return [population[bisect(cum_weights, random() * total)] for i in range(k)]

ソース： https://hg.python.org/cpython/file/tip/Lib/random.py#l34

phihag · Answer

私は選択肢の合計が1であることを要求しますが、これはとにかく動作します

def weightedChoice(choices): # Safety check, you can remove it for c,w in choices: assert w >= 0 tmp = random.uniform(0, sum(c for c,w in choices)) for choice,weight in choices: if tmp < weight: return choice else: tmp -= weight raise ValueError('Negative values in input')

whi · Answer

import numpy as np w=np.array([ 0.4, 0.8, 1.6, 0.8, 0.4]) np.random.choice(w, p=w/sum(w))

AShelly · Answer

重み付けされた選択肢のリストが比較的静的で、頻繁なサンプリングが必要な場合は、1つのO(N)前処理ステップを実行してから、次の関数を使用してO（1）で選択を実行できます。この関連する答え。

# run only when `choices` changes. preprocessed_data = prep(weight for _,weight in choices) # O(1) selection value = choices[sample(preprocessed_data)][0]

ArturJ · Answer

役に立つものを投稿するには遅すぎるかもしれませんが、ここに簡単で短く、非常に効率的なスニペットを示します。

def choose_index(probabilies): cmf = probabilies[0] choice = random.random() for k in xrange(len(probabilies)): if choice <= cmf: return k else: cmf += probabilies[k+1]

確率を並べ替えたり、cmfでベクトルを作成する必要はありません。選択が見つかると終了します。メモリ：O（1）、時間：O（N）、平均実行時間〜N/2。

重みがある場合は、1行追加するだけです。

def choose_index(weights): probabilities = weights / sum(weights) cmf = probabilies[0] choice = random.random() for k in xrange(len(probabilies)): if choice <= cmf: return k else: cmf += probabilies[k+1]

Uppinder Chugh · Answer

分布をサンプリングする回数に依存します。

分布をK回サンプリングするとします。その後、nがディストリビューション内のアイテムの数である場合、毎回np.random.choice()を使用する時間の複雑さはO(K(n + log(n)))です。

私の場合、同じ分布を10 ^ 3の次数で複数回サンプリングする必要がありました（nは10 ^ 6の次数）。以下のコードを使用しました。これは累積分布を事前計算し、O(log(n))でサンプリングします。全体的な時間の複雑さはO(n+K*log(n))です。

import numpy as np n,k = 10**6,10**3 # Create dummy distribution a = np.array([i+1 for i in range(n)]) p = np.array([1.0/n]*n) cfd = p.cumsum() for _ in range(k): x = np.random.uniform() idx = cfd.searchsorted(x, side='right') sampled_element = a[idx]

Mark · Answer

一般的な解決策：

import random def weighted_choice(choices, weights): total = sum(weights) treshold = random.uniform(0, total) for k, weight in enumerate(weights): total -= weight if total < treshold: return choices[k]

blue_note · Answer

Numpyを使用する

def choice(items, weights): return items[np.argmin((np.cumsum(weights) / sum(weights)) < np.random.Rand())]

murphsp1 · Answer

以下は、numpyを使用するweighted_choiceの別のバージョンです。重みベクトルを渡すと、選択されたビンを示す1を含む0の配列が返されます。コードはデフォルトで単一の描画を作成するだけですが、作成する描画の数を渡すことができ、描画されたビンごとのカウントが返されます。

重みベクトルの合計が1にならない場合、それが正規化されるようになります。

import numpy as np def weighted_choice(weights, n=1): if np.sum(weights)!=1: weights = weights/np.sum(weights) draws = np.random.random_sample(size=n) weights = np.cumsum(weights) weights = np.insert(weights,0,0.0) counts = np.histogram(draws, bins=weights) return(counts[0])

ML_Dev · Answer

私はそれらの構文が好きではありませんでした。アイテムが何で、それぞれの重みが何であるかを指定したかっただけです。 random.choicesを使用できたはずですが、代わりにすぐに以下のクラスを作成しました。

import random, string from numpy import cumsum class randomChoiceWithProportions: ''' Accepts a dictionary of choices as keys and weights as values. Example if you want a unfair dice: choiceWeightDic = {"1":0.16666666666666666, "2": 0.16666666666666666, "3": 0.16666666666666666 , "4": 0.16666666666666666, "5": .06666666666666666, "6": 0.26666666666666666} dice = randomChoiceWithProportions(choiceWeightDic) samples = [] for i in range(100000): samples.append(dice.sample()) # Should be close to .26666 samples.count("6")/len(samples) # Should be close to .16666 samples.count("1")/len(samples) ''' def __init__(self, choiceWeightDic): self.choiceWeightDic = choiceWeightDic weightSum = sum(self.choiceWeightDic.values()) assert weightSum == 1, 'Weights sum to ' + str(weightSum) + ', not 1.' self.valWeightDict = self._compute_valWeights() def _compute_valWeights(self): valWeights = list(cumsum(list(self.choiceWeightDic.values()))) valWeightDict = dict(Zip(list(self.choiceWeightDic.keys()), valWeights)) return valWeightDict def sample(self): num = random.uniform(0,1) for key, val in self.valWeightDict.items(): if val >= num: return key

Tony Veijalainen · Answer

私は先の尖った他のスレッドを見て、コーディングスタイルのこのバリエーションを考え出しました。これは集計のために選択したインデックスを返しますが、文字列を返すのは簡単です（コメントされた戻りの代替）：

import random import bisect try: range = xrange except: pass def weighted_choice(choices): total, cumulative = 0, [] for c,w in choices: total += w cumulative.append((total, c)) r = random.uniform(0, total) # return index return bisect.bisect(cumulative, (r,)) # return item string #return choices[bisect.bisect(cumulative, (r,))][0] # define choices and relative weights choices = [("WHITE",90), ("RED",8), ("GREEN",2)] tally = [0 for item in choices] n = 100000 # tally up n weighted choices for i in range(n): tally[weighted_choice(choices)] += 1 print([t/sum(tally)*100 for t in tally])

Stas Baskin · Answer

最終的にこのテンプレートを作成したアイデアを検索することから、このような非常に高速で非常に簡単な操作を行う必要がありました。このアイデアは、APIからjsonの形式で加重値を受け取ります。これは、ここではdictによってシミュレートされています。

次に、各値がその重みに比例して繰り返されるリストに変換し、random.choiceを使用してリストから値を選択します。

10回、100回、1000回の反復で実行してみました。分布はかなり安定しているようです。

def weighted_choice(weighted_dict): """Input example: dict(apples=60, oranges=30, pineapples=10)""" weight_list = [] for key in weighted_dict.keys(): weight_list += [key] * weighted_dict[key] return random.choice(weight_list)

Perennial · Answer

1つの方法は、すべての重みの合計をランダム化し、その値を各変数の限界点として使用することです。ジェネレーターとしての大まかな実装を次に示します。

def Rand_weighted(weights): """ Generator which uses the weights to generate a weighted random values """ sum_weights = sum(weights.values()) cum_weights = {} current_weight = 0 for key, value in sorted(weights.iteritems()): current_weight += value cum_weights[key] = current_weight while True: sel = int(random.uniform(0, 1) * sum_weights) for key, value in sorted(cum_weights.iteritems()): if sel < value: break yield key