Pythonパンダ：DataFrameグループの最後の値をそのグループのすべてのエントリに割り当てます

Question

Python Pandasには、DataFrameがあります。このDataFrameを列ごとにグループ化し、列の最後の値を別の列のすべての行に割り当てたいと考えています。

このコマンドでグループの最後の行を選択できることを知っています。

import pandas as pd df = pd.DataFrame({'a': (1,1,2,3,3), 'b':(20,21,30,40,41)}) print(df) print("-") result = df.groupby('a').nth(-1) print(result)

結果：

 a b 0 1 20 1 1 21 2 2 30 3 3 40 4 3 41 - b a 1 21 2 30 3 41

この操作の結果を元のデータフレームに割り当てて、次のようなものにする方法を教えてください。

 a b b_new 0 1 20 21 1 1 21 21 2 2 30 30 3 3 40 41 4 3 41 41

jezrael · Accepted Answer

transform と last を使用します。

df['b_new'] = df.groupby('a')['b'].transform('last')

代替：

df['b_new'] = df.groupby('a')['b'].transform(lambda x: x.iat[-1]) print(df) a b b_new 0 1 20 21 1 1 21 21 2 2 30 30 3 3 40 41 4 3 41 41

nth および join のソリューション：

df = df.join(df.groupby('a')['b'].nth(-1).rename('b_new'), 'a') print(df) a b b_new 0 1 20 21 1 1 21 21 2 2 30 30 3 3 40 41 4 3 41 41

タイミング：

N = 10000 df = pd.DataFrame({'a':np.random.randint(1000,size=N), 'b':np.random.randint(10000,size=N)}) #print (df) def f(df): return df.join(df.groupby('a')['b'].nth(-1).rename('b_new'), 'a') #cᴏʟᴅsᴘᴇᴇᴅ1 In [211]: %timeit df['b_new'] = df.a.map(df.groupby('a').b.nth(-1)) 100 loops, best of 3: 3.57 ms per loop #cᴏʟᴅsᴘᴇᴇᴅ2 In [212]: %timeit df['b_new'] = df.a.replace(df.groupby('a').b.nth(-1)) 10 loops, best of 3: 71.3 ms per loop #jezrael1 In [213]: %timeit df['b_new'] = df.groupby('a')['b'].transform('last') 1000 loops, best of 3: 1.82 ms per loop #jezrael2 In [214]: %timeit df['b_new'] = df.groupby('a')['b'].transform(lambda x: x.iat[-1]) 10 loops, best of 3: 178 ms per loop #jezrael3 In [219]: %timeit f(df) 100 loops, best of 3: 3.63 ms per loop

警告

グループの数を考えると、結果はパフォーマンスに対応していません。これは、これらのソリューションの一部のタイミングに大きく影響します。

cs95 · Answer

groupby + nth + mapまたはreplaceの2つの可能性

_df['b_new'] = df.a.map(df.groupby('a').b.nth(-1)) _

または、

_df['b_new'] = df.a.replace(df.groupby('a').b.nth(-1)) _

nth(-1)をlast()に置き換えることもできます（実際、そうすることでこれが少し速くなります）が、nthを使用すると、選択するアイテムをより柔軟に選択できます。 bの各グループから。

_df a b b_new 0 1 20 21 1 1 21 21 2 2 30 30 3 3 40 41 4 3 41 41 _

WeNYoBen · Answer

これは速いはずだと思います

df.merge(df.drop_duplicates('a',keep='last'),on='a',how='left') Out[797]: a b_x b_y 0 1 20 21 1 1 21 21 2 2 30 30 3 3 40 41 4 3 41 41