numpy 2d配列を平均化またはリビンしてサイズを変更する

Question

私はpython IDL関数で再実装しようとしています：

http://star.pst.qub.ac.uk/idl/REBIN.html

これは、平均化によって2次元配列を整数係数で縮小します。

例えば：

_>>> a=np.arange(24).reshape((4,6)) >>> a array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23]]) _

関連するサンプルの平均を取ることにより、（2,3）にサイズ変更したいと思います。予想される出力は次のようになります。

_>>> b = rebin(a, (2, 3)) >>> b array([[ 3.5, 5.5, 7.5], [ 15.5, 17.5, 19.5]]) _

つまり、b[0,0] = np.mean(a[:2,:2]), b[0,1] = np.mean(a[:2,2:4])などです。

4次元配列に再形成してから正しいスライスで平均を取る必要があると思いますが、アルゴリズムを理解できませんでした。ヒントはありますか？

jfs · Accepted Answer

リンクした回答（わかりやすくするため）に基づく例を次に示します。

>>> import numpy as np >>> a = np.arange(24).reshape((4,6)) >>> a array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23]]) >>> a.reshape((2,a.shape[0]//2,3,-1)).mean(axis=3).mean(1) array([[ 3.5, 5.5, 7.5], [ 15.5, 17.5, 19.5]])

関数として：

def rebin(a, shape): sh = shape[0],a.shape[0]//shape[0],shape[1],a.shape[1]//shape[1] return a.reshape(sh).mean(-1).mean(1)

derricw · Answer

J.F.セバスチャンは、2Dビニングに対して素晴らしい答えを持っています。以下は、N次元で機能する彼の「リビン」関数のバージョンです。

def bin_ndarray(ndarray, new_shape, operation='sum'): """ Bins an ndarray in all axes based on the target shape, by summing or averaging. Number of output dimensions must match number of input dimensions and new axes must divide old ones. Example ------- >>> m = np.arange(0,100,1).reshape((10,10)) >>> n = bin_ndarray(m, new_shape=(5,5), operation='sum') >>> print(n) [[ 22 30 38 46 54] [102 110 118 126 134] [182 190 198 206 214] [262 270 278 286 294] [342 350 358 366 374]] """ operation = operation.lower() if not operation in ['sum', 'mean']: raise ValueError("Operation not supported.") if ndarray.ndim != len(new_shape): raise ValueError("Shape mismatch: {} -> {}".format(ndarray.shape, new_shape)) compression_pairs = [(d, c//d) for d,c in Zip(new_shape, ndarray.shape)] flattened = [l for p in compression_pairs for l in p] ndarray = ndarray.reshape(flattened) for i in range(len(new_shape)): op = getattr(ndarray, operation) ndarray = op(-1*(i+1)) return ndarray

MarcTheSpark · Answer

古い行列を分割するために新しい配列の次元を必要としない行列乗算を使用して、あなたが求めることを行う方法を次に示します。

まず、行コンプレッサー行列と列コンプレッサー行列を生成します（これを行うためのより明確な方法があると確信しています）。

_def get_row_compressor(old_dimension, new_dimension): dim_compressor = np.zeros((new_dimension, old_dimension)) bin_size = float(old_dimension) / new_dimension next_bin_break = bin_size which_row = 0 which_column = 0 while which_row < dim_compressor.shape[0] and which_column < dim_compressor.shape[1]: if round(next_bin_break - which_column, 10) >= 1: dim_compressor[which_row, which_column] = 1 which_column += 1 Elif next_bin_break == which_column: which_row += 1 next_bin_break += bin_size else: partial_credit = next_bin_break - which_column dim_compressor[which_row, which_column] = partial_credit which_row += 1 dim_compressor[which_row, which_column] = 1 - partial_credit which_column += 1 next_bin_break += bin_size dim_compressor /= bin_size return dim_compressor def get_column_compressor(old_dimension, new_dimension): return get_row_compressor(old_dimension, new_dimension).transpose() _

...たとえば、get_row_compressor(5, 3)は次のようになります。

_[[ 0.6 0.4 0. 0. 0. ] [ 0. 0.2 0.6 0.2 0. ] [ 0. 0. 0. 0.4 0.6]] _

そしてget_column_compressor(3, 2)はあなたに与える：

_[[ 0.66666667 0. ] [ 0.33333333 0.33333333] [ 0. 0.66666667]] _

次に、行コンプレッサーで前乗算し、列コンプレッサーで後乗算して、圧縮された行列を取得します。

_def compress_and_average(array, new_shape): # Note: new shape should be smaller in both dimensions than old shape return np.mat(get_row_compressor(array.shape[0], new_shape[0])) * \ np.mat(array) * \ np.mat(get_column_compressor(array.shape[1], new_shape[1])) _

このテクニックを使用して、

_compress_and_average(np.array([[50, 7, 2, 0, 1], [0, 0, 2, 8, 4], [4, 1, 1, 0, 0]]), (2, 3)) _

収量：

_[[ 21.86666667 2.66666667 2.26666667] [ 1.86666667 1.46666667 1.86666667]] _

David Kroodsma · Answer

私はラスターをダウンスケールしようとしていました-およそ6000 x 2000サイズのラスターを取り、それを以前のビンサイズ全体で適切に値を平均化した任意のサイズの小さなラスターに変換しました。 SciPyを使用するソリューションを見つけましたが、使用している共有ホスティングサービスにSciPyをインストールできなかったため、代わりにこの関数を作成しました。行と列をループすることを含まない、これを行うより良い方法があると思われますが、これはうまくいくようです。

これに関する素晴らしい点は、古い行と列の数を新しい行と列の数で割り切れる必要がないことです。

def resize_array(a, new_rows, new_cols): ''' This function takes an 2D numpy array a and produces a smaller array of size new_rows, new_cols. new_rows and new_cols must be less than or equal to the number of rows and columns in a. ''' rows = len(a) cols = len(a[0]) yscale = float(rows) / new_rows xscale = float(cols) / new_cols # first average across the cols to shorten rows new_a = np.zeros((rows, new_cols)) for j in range(new_cols): # get the indices of the original array we are going to average across the_x_range = (j*xscale, (j+1)*xscale) firstx = int(the_x_range[0]) lastx = int(the_x_range[1]) # figure out the portion of the first and last index that overlap # with the new index, and thus the portion of those cells that # we need to include in our average x0_scale = 1 - (the_x_range[0]-int(the_x_range[0])) xEnd_scale = (the_x_range[1]-int(the_x_range[1])) # scale_line is a 1d array that corresponds to the portion of each old # index in the_x_range that should be included in the new average scale_line = np.ones((lastx-firstx+1)) scale_line[0] = x0_scale scale_line[-1] = xEnd_scale # Make sure you don't screw up and include an index that is too large # for the array. This isn't great, as there could be some floating # point errors that mess up this comparison. if scale_line[-1] == 0: scale_line = scale_line[:-1] lastx = lastx - 1 # Now it's linear algebra time. Take the dot product of a slice of # the original array and the scale_line new_a[:,j] = np.dot(a[:,firstx:lastx+1], scale_line)/scale_line.sum() # Then average across the rows to shorten the cols. Same method as above. # It is probably possible to simplify this code, as this is more or less # the same procedure as the block of code above, but transposed. # Here I'm reusing the variable a. Sorry if that's confusing. a = np.zeros((new_rows, new_cols)) for i in range(new_rows): the_y_range = (i*yscale, (i+1)*yscale) firsty = int(the_y_range[0]) lasty = int(the_y_range[1]) y0_scale = 1 - (the_y_range[0]-int(the_y_range[0])) yEnd_scale = (the_y_range[1]-int(the_y_range[1])) scale_line = np.ones((lasty-firsty+1)) scale_line[0] = y0_scale scale_line[-1] = yEnd_scale if scale_line[-1] == 0: scale_line = scale_line[:-1] lasty = lasty - 1 a[i:,] = np.dot(scale_line, new_a[firsty:lasty+1,])/scale_line.sum() return a