内ではなく、2つの異なるNumpy配列内のポイント間の最小ユークリッド距離

Question

x -y座標の2つの配列があり、eachポイント間の最小ユークリッド距離を- all他の配列のポイント。配列は必ずしも同じサイズではありません。例えば：

xy1=numpy.array( [[ 243, 3173], [ 525, 2997]]) xy2=numpy.array( [[ 682, 2644], [ 277, 2651], [ 396, 2640]])

現在の方法では、xy1の各座標xyをループして、その座標と他の座標の間の距離を計算します。

mindist=numpy.zeros(len(xy1)) minid=numpy.zeros(len(xy1)) for i,xy in enumerate(xy1): dists=numpy.sqrt(numpy.sum((xy-xy2)**2,axis=1)) mindist[i],minid[i]=dists.min(),dists.argmin()

Forループを排除し、2つの配列間で要素ごとの計算を行う方法はありますか？各行または列の最小要素を見つけることができる距離行列を生成することを想定しています。

問題を見る別の方法。 xy1（長さm）とxy2（長さp）をxy（長さn）、そして私は元の配列の長さを保存します。理論的には、これらの座標からn x n距離行列を生成でき、そこからm x pサブ行列を取得できます。この部分行列を効率的に生成する方法はありますか？

denis · Accepted Answer

（数か月後）scipy.spatial.distance.cdist( X, Y )は、XとYについて、距離のすべてのペアを示します。
また、22の異なる規範を実行します。詳細ここ。

# cdist example: (nx,dim) (ny,dim) -> (nx,ny) from __future__ import division import sys import numpy as np from scipy.spatial.distance import cdist #............................................................................... dim = 10 nx = 1000 ny = 100 metric = "euclidean" seed = 1 # change these params in sh or ipython: run this.py dim=3 ... for arg in sys.argv[1:]: exec( arg ) np.random.seed(seed) np.set_printoptions( 2, threshold=100, edgeitems=10, suppress=True ) title = "%s dim %d nx %d ny %d metric %s" % ( __file__, dim, nx, ny, metric ) print "
", title #............................................................................... X = np.random.uniform( 0, 1, size=(nx,dim) ) Y = np.random.uniform( 0, 1, size=(ny,dim) ) dist = cdist( X, Y, metric=metric ) # -> (nx, ny) distances #............................................................................... print "scipy.spatial.distance.cdist: X %s Y %s -> %s" % ( X.shape, Y.shape, dist.shape ) print "dist average %.3g +- %.2g" % (dist.mean(), dist.std()) print "check: dist[0,3] %.3g == cdist( [X[0]], [Y[3]] ) %.3g" % ( dist[0,3], cdist( [X[0]], [Y[3]] )) # (trivia: how do pairwise distances between uniform-random points in the unit cube # depend on the metric ? With the right scaling, not much at all: # L1 / dim ~ .33 +- .2/sqrt dim # L2 / sqrt dim ~ .4 +- .2/sqrt dim # Lmax / 2 ~ .4 +- .2/sqrt dim

Alex Martelli · Answer

距離のm行p列の行列を計算するには、これでうまくいくはずです。

>>> def distances(xy1, xy2): ... d0 = numpy.subtract.outer(xy1[:,0], xy2[:,0]) ... d1 = numpy.subtract.outer(xy1[:,1], xy2[:,1]) ... return numpy.hypot(d0, d1)

.outer呼び出しは、2つのそのような行列（2つの軸に沿ったスカラー差の）、.hypot呼び出しは、それらを（スカラーユークリッド距離の）同じ形状の行列に変換します。

divenex · Answer

受け入れられた回答は、2つのポイントセット間の最小距離を求める要求ではなく、every2つのセットのポイント。

元の質問に対する簡単な解決策は確かにeveryペア間の距離を計算し、結果として最小のペアを見つけることで構成されますが、これが必要なのは、最小距離に興味があります。後者の問題には、はるかに高速な解決策があります。

提案されたすべてのソリューションには、m*p = len(xy1)*len(xy2)としてスケーリングされる実行時間があります。これは小さなデータセットの場合は問題ありませんが、m*log(p)としてスケーリングする最適なソリューションを記述して、大きなxy2データセットの大幅な節約を実現できます。

この最適な実行時間のスケーリングは、次のように scipy.spatial.cKDTree を使用して実現できます。

import numpy as np from scipy import spatial xy1 = np.array( [[243, 3173], [525, 2997]]) xy2 = np.array( [[682, 2644], [277, 2651], [396, 2640]]) # This solution is optimal when xy2 is very large tree = spatial.cKDTree(xy2) mindist, minid = tree.query(xy1) print(mindist) # This solution by @denis is OK for small xy2 mindist = np.min(spatial.distance.cdist(xy1, xy2), axis=1) print(mindist)

ここで、mindistはxy1の各ポイントとxy2のポイントのセット間の最小距離です

Alok Singhal · Answer

あなたがやろうとしていることのために：

dists = numpy.sqrt((xy1[:, 0, numpy.newaxis] - xy2[:, 0])**2 + (xy1[:, 1, numpy.newaxis - xy2[:, 1])**2) mindist = numpy.min(dists, axis=1) minid = numpy.argmin(dists, axis=1)

編集：sqrtを呼び出す代わりに、四角などを実行する代わりに、numpy.hypotを使用できます。

dists = numpy.hypot(xy1[:, 0, numpy.newaxis]-xy2[:, 0], xy1[:, 1, numpy.newaxis]-xy2[:, 1])

Maanasa Priya · Answer

import numpy as np P = np.add.outer(np.sum(xy1**2, axis=1), np.sum(xy2**2, axis=1)) N = np.dot(xy1, xy2.T) dists = np.sqrt(P - 2*N)