境界チェックはCUDAではサポートされていません

Question

コードを高速化するためにNumbaを使用してGPUにアクセスしようとしていますが、次のエラーが発生します。

_in jit raise NotImplementedError("bounds checking is not supported for CUDA") NotImplementedError: bounds checking is not supported for CUDA _

別の質問が出されたのを見ましたが、完全に指定も回答もされていませんここ。ベクトル化されたコード（y = corr*x + np.sqrt(1.-corr**2)*z）が機能しない（同じエラー）ことがわかったときに、2つのforループを実装しました。また、オプションboundscheckを試してみましたが、結果は変わりませんでした。 targetを指定しない場合は、CPUで自動的に発生するため、エラーは表示されませんでした（おそらく）。

_import numpy as np from numba import jit N = int(1e8) @jit(nopython=True, target='cuda', boundscheck=False) def Brownian_motions(T, N, corr): x = np.random.normal(0, 1, size=(T,N)) z = np.random.normal(0, 1, size=(T,N)) y = np.zeros(shape=(T,N)) for i in range(T): for j in range(N): y[i,j] = corr*x[i,j] + np.sqrt(1.-corr**2)*z[i,j] return(x,y) x, y = Brownian_motions(T = 500, N = N, corr = -0.45) _

手伝っていただけませんか？ Pythonは3.7.6、Numbaは0.48.0です。

Bilal Chandio · Answer

私の場合、XLAを使用して複数の操作をコンパイルするためのデコレータである@ jitにも置き換えました。 CPUとGPUのパフォーマンスを確認するコードの例を次に示します。

from numba import jit import numpy as np # to measure exec time from timeit import default_timer as timer # normal function to run on cpu def func(a): for i in range(10000000): a[i]+= 1 # function optimized to run on gpu @jit #(target ="cuda") def func2(a): for i in range(10000000): a[i]+= 1 if __name__=="__main__": n = 10000000 a = np.ones(n, dtype = np.float64) b = np.ones(n, dtype = np.float32) start = timer() func(a) print("without GPU:", timer()-start) start = timer() func2(a) print("with GPU:", timer()-start)

結果：GPUなし：5.353004818000045 GPUあり：0.23115529000006063

Amit · Answer

@jit（nopython = True、target = 'cuda'、boundscheck = False）を@jitに置き換えます

import numpy as np from numba import jit N = int(1e8) @jit def Brownian_motions(T, N, corr): x = np.random.normal(0, 1, size=(T,N)) z = np.random.normal(0, 1, size=(T,N)) y = np.zeros(shape=(T,N)) for i in range(T): for j in range(N): y[i,j] = corr*x[i,j] + np.sqrt(1.-corr**2)*z[i,j] return(x,y) x, y = Brownian_motions(T = 500, N = N, corr = -0.45)