C＃で配列をコピーするより速い方法はありますか？

Question

1つの3次元配列に結合する必要がある3つの配列があります。次のコードは、Performance Explorerのパフォーマンスの低下を示しています。より速い解決策はありますか？

for (int i = 0; i < sortedIndex.Length; i++) { if (i < num_in_left) { // add instance to the left child leftnode[i, 0] = sortedIndex[i]; leftnode[i, 1] = sortedInstances[i]; leftnode[i, 2] = sortedLabels[i]; } else { // add instance to the right child rightnode[i-num_in_left, 0] = sortedIndex[i]; rightnode[i-num_in_left, 1] = sortedInstances[i]; rightnode[i-num_in_left, 2] = sortedLabels[i]; } }

更新：

私は実際に次のことをしようとしています：

//given three 1d arrays double[] sortedIndex, sortedInstances, sortedLabels; // copy them over to a 3d array (forget about the rightnode for now) double[] leftnode = new double[sortedIndex.Length, 3]; // some magic happens here so that leftnode = {sortedIndex, sortedInstances, sortedLabels};

Mehrdad · Accepted Answer

Buffer.BlockCopy を使用します。その全体的な目的は、高速に実行することです（ Buffer を参照）。

このクラスは、System.Arrayクラスの同様のメソッドよりもプリミティブ型の操作のパフォーマンスが向上します。

確かに、私はベンチマークを行っていませんが、それはドキュメントです。また、多次元配列でも機能します。常に要素の数ではなく、コピーするbytesの数を指定していること、およびプリミティブ配列で作業していることを確認してください。

また、私はこれをテストしていませんが、デリゲートをSystem.Buffer.memcpyimplにバインドして直接呼び出すと、システムからパフォーマンスを少し絞ることができますmight 。署名は次のとおりです。

internal static unsafe void memcpyimpl(byte* src, byte* dest, int len)

ポインターを必要としますが、可能な限り最高の速度に最適化されていると思いますので、アセンブリーが手元にあったとしても、それより速くなる方法はないと思います。

更新：

リクエストのため（そして私の好奇心を満たすため）、私はこれをテストしました：

using System; using System.Diagnostics; using System.Reflection; unsafe delegate void MemCpyImpl(byte* src, byte* dest, int len); static class Temp { //There really should be a generic CreateDelegate<T>() method... -___- static MemCpyImpl memcpyimpl = (MemCpyImpl)Delegate.CreateDelegate( typeof(MemCpyImpl), typeof(Buffer).GetMethod("memcpyimpl", BindingFlags.Static | BindingFlags.NonPublic)); const int COUNT = 32, SIZE = 32 << 20; //Use different buffers to help avoid CPU cache effects static byte[] aSource = new byte[SIZE], aTarget = new byte[SIZE], bSource = new byte[SIZE], bTarget = new byte[SIZE], cSource = new byte[SIZE], cTarget = new byte[SIZE]; static unsafe void TestUnsafe() { Stopwatch sw = Stopwatch.StartNew(); fixed (byte* pSrc = aSource) fixed (byte* pDest = aTarget) for (int i = 0; i < COUNT; i++) memcpyimpl(pSrc, pDest, SIZE); sw.Stop(); Console.WriteLine("Buffer.memcpyimpl: {0:N0} ticks", sw.ElapsedTicks); } static void TestBlockCopy() { Stopwatch sw = Stopwatch.StartNew(); sw.Start(); for (int i = 0; i < COUNT; i++) Buffer.BlockCopy(bSource, 0, bTarget, 0, SIZE); sw.Stop(); Console.WriteLine("Buffer.BlockCopy: {0:N0} ticks", sw.ElapsedTicks); } static void TestArrayCopy() { Stopwatch sw = Stopwatch.StartNew(); sw.Start(); for (int i = 0; i < COUNT; i++) Array.Copy(cSource, 0, cTarget, 0, SIZE); sw.Stop(); Console.WriteLine("Array.Copy: {0:N0} ticks", sw.ElapsedTicks); } static void Main(string[] args) { for (int i = 0; i < 10; i++) { TestArrayCopy(); TestBlockCopy(); TestUnsafe(); Console.WriteLine(); } } }

結果：

Buffer.BlockCopy: 469,151 ticks Array.Copy: 469,972 ticks Buffer.memcpyimpl: 496,541 ticks Buffer.BlockCopy: 421,011 ticks Array.Copy: 430,694 ticks Buffer.memcpyimpl: 410,933 ticks Buffer.BlockCopy: 425,112 ticks Array.Copy: 420,839 ticks Buffer.memcpyimpl: 411,520 ticks Buffer.BlockCopy: 424,329 ticks Array.Copy: 420,288 ticks Buffer.memcpyimpl: 405,598 ticks Buffer.BlockCopy: 422,410 ticks Array.Copy: 427,826 ticks Buffer.memcpyimpl: 414,394 ticks

次に順序を変更します。

Array.Copy: 419,750 ticks Buffer.memcpyimpl: 408,919 ticks Buffer.BlockCopy: 419,774 ticks Array.Copy: 430,529 ticks Buffer.memcpyimpl: 412,148 ticks Buffer.BlockCopy: 424,900 ticks Array.Copy: 424,706 ticks Buffer.memcpyimpl: 427,861 ticks Buffer.BlockCopy: 421,929 ticks Array.Copy: 420,556 ticks Buffer.memcpyimpl: 421,541 ticks Buffer.BlockCopy: 436,430 ticks Array.Copy: 435,297 ticks Buffer.memcpyimpl: 432,505 ticks Buffer.BlockCopy: 441,493 ticks

次に、順序を再度変更します。

Buffer.memcpyimpl: 430,874 ticks Buffer.BlockCopy: 429,730 ticks Array.Copy: 432,746 ticks Buffer.memcpyimpl: 415,943 ticks Buffer.BlockCopy: 423,809 ticks Array.Copy: 428,703 ticks Buffer.memcpyimpl: 421,270 ticks Buffer.BlockCopy: 428,262 ticks Array.Copy: 434,940 ticks Buffer.memcpyimpl: 423,506 ticks Buffer.BlockCopy: 427,220 ticks Array.Copy: 431,606 ticks Buffer.memcpyimpl: 422,900 ticks Buffer.BlockCopy: 439,280 ticks Array.Copy: 432,649 ticks

または、言い換えれば、彼らは非常に競争力があります。原則として、memcpyimplは最も高速ですが、必ずしも心配する価値はありません。

Marlon · Answer

Array.Copyを使用できます。

[〜＃〜] edit [〜＃〜]

Array.Copyは多次元配列で機能します：このトピックを参照してください。

ja72 · Answer

プリミティブ型の配列（doubleなど）の場合、ポインターを含む多次元配列の場合でも、高速にコピーできます。

以下のコードでは、1〜100の値で2D配列_A[10,10]_を初期化します。次に、これらの値を1D配列_B[100]_にコピーします

_unsafe class Program { static void Main(string[] args) { double[,] A = new double[10, 10]; for(int i = 0; i < 10; i++) { for(int j = 0; j < 10; j++) { A[i, j] = 10 * i + j + 1; } } // A has { { 1 ,2 .. 10}, { 11, 12 .. 20}, .. { .. 99, 100} } double[] B = new double[10 * 10]; if (A.Length == B.Length) { fixed (double* pA = A, pB = B) { for(int i = 0; i < B.Length; i++) { pB[i] = pA[i]; } } // B has {1, 2, 3, 4 .. 100} } } } _

速さ。私のテストでは、ネイティブC＃コピーおよびBuffer.BlockCopy()よりも何倍も高速であることが示されました。あなたのケースのためにそれを試して、私たちに知らせてください。

編集1コピーを4つの方法と比較しました。 1）2つのネストされたループ、2）1つのシリアルループ、3）ポインター、4）BlockCopy。さまざまなサイズの配列のティックごとのコピー数を測定しました。

_N = 10x 10 (cpy/tck) Nested = 50, Serial = 33, Pointer = 100, Buffer = 16 N = 20x 20 (cpy/tck) Nested = 133, Serial = 40, Pointer = 400, Buffer = 400 N = 50x 50 (cpy/tck) Nested = 104, Serial = 40, Pointer = 2500, Buffer = 2500 N = 100x 100 (cpy/tck) Nested = 61, Serial = 41, Pointer = 10000, Buffer = 3333 N = 200x 200 (cpy/tck) Nested = 84, Serial = 41, Pointer = 40000, Buffer = 2666 N = 500x 500 (cpy/tck) Nested = 69, Serial = 41, Pointer = 125000, Buffer = 2840 N = 1000x1000 (cpy/tck) Nested = 33, Serial = 45, Pointer = 142857, Buffer = 1890 N = 2000x2000 (cpy/tck) Nested = 30, Serial = 43, Pointer = 266666, Buffer = 1826 N = 5000x5000 (cpy/tck) Nested = 21, Serial = 42, Pointer = 735294, Buffer = 1712 _

ここで誰が勝者かは明らかです。ポインターのコピーは、他のどの方法よりも桁違いに優れています。

編集2どうやらコンパイラー/ JIT最適化を不当に利用していたようです。

_N = 10x 10 (cpy/tck) Nested = 0, Serial = 0, Pointer = 0, Buffer = 0 N = 20x 20 (cpy/tck) Nested = 80, Serial = 14, Pointer = 100, Buffer = 133 N = 50x 50 (cpy/tck) Nested =147, Serial = 15, Pointer = 277, Buffer = 2500 N = 100x 100 (cpy/tck) Nested = 98, Serial = 15, Pointer = 285, Buffer = 3333 N = 200x 200 (cpy/tck) Nested =106, Serial = 15, Pointer = 272, Buffer = 3076 N = 500x 500 (cpy/tck) Nested =106, Serial = 15, Pointer = 276, Buffer = 3125 N = 1000x1000 (cpy/tck) Nested =101, Serial = 11, Pointer = 199, Buffer = 1396 N = 2000x2000 (cpy/tck) Nested =105, Serial = 9, Pointer = 186, Buffer = 1804 N = 5000x5000 (cpy/tck) Nested =102, Serial = 8, Pointer = 170, Buffer = 1673 _

バッファされたコピーは、ここで一番上にあり（@Mehrdadに感謝）、2番目にポインターコピーがあります。ここでの質問は、なぜポインターコピーがバッファメソッドほど速くないのかということです。

Honza R · Answer

.NET Coreで実行している場合は、source.AsSpan().CopyTo(destination)の使用を検討してください（ただし、Monoには注意してください）。

 Method | Job | Runtime | Mean | Error | StdDev | Ratio | RatioSD | ---------------- |----- |-------- |----------:|----------:|----------:|------:|--------:| ArrayCopy | Clr | Clr | 60.08 ns | 0.8231 ns | 0.7699 ns | 1.00 | 0.00 | SpanCopy | Clr | Clr | 99.31 ns | 0.4895 ns | 0.4339 ns | 1.65 | 0.02 | BufferBlockCopy | Clr | Clr | 61.34 ns | 0.5963 ns | 0.5578 ns | 1.02 | 0.01 | | | | | | | | | ArrayCopy | Core | Core | 63.33 ns | 0.6843 ns | 0.6066 ns | 1.00 | 0.00 | SpanCopy | Core | Core | 47.41 ns | 0.5399 ns | 0.5050 ns | 0.75 | 0.01 | BufferBlockCopy | Core | Core | 59.89 ns | 0.4713 ns | 0.3936 ns | 0.94 | 0.01 | | | | | | | | | ArrayCopy | Mono | Mono | 149.82 ns | 1.6466 ns | 1.4596 ns | 1.00 | 0.00 | SpanCopy | Mono | Mono | 347.87 ns | 2.0589 ns | 1.9259 ns | 2.32 | 0.02 | BufferBlockCopy | Mono | Mono | 61.52 ns | 1.1691 ns | 1.0364 ns | 0.41 | 0.01 |

DragonSpit · Answer

次の形式のギザギザの配列が機能する場合、コピーを回避できます。

double[][] leftNode = new double[3][]; leftNode[0] = sortedIndex; leftNode[1] = sortedInstances; leftNode[2] = sortedLabels;