複数のgroupbyの後、インデックスから列にデータを移動する方法pandas

Question

私は次のpandasデータフレーム：

dfalph.head() token year uses books 386 xanthos 1830 3 3 387 xanthos 1840 1 1 388 xanthos 1840 2 2 389 xanthos 1868 2 2 390 xanthos 1875 1 1

重複するtokenとyearsで行を集約します：

dfalph = dfalph[['token','year','uses','books']].groupby(['token', 'year']).agg([np.sum]) dfalph.columns = dfalph.columns.droplevel(1) dfalph.head() uses books token year xanthos 1830 3 3 1840 3 3 1867 2 2 1868 2 2 1875 1 1

インデックスに「トークン」フィールドと「年」フィールドを含める代わりに、それらを列に返し、整数インデックスを持ちたいと思います。

DSM · Accepted Answer

メソッド＃1：reset_index()

>>> g uses books sum sum token year xanthos 1830 3 3 1840 3 3 1868 2 2 1875 1 1 [4 rows x 2 columns] >>> g = g.reset_index() >>> g token year uses books sum sum 0 xanthos 1830 3 3 1 xanthos 1840 3 3 2 xanthos 1868 2 2 3 xanthos 1875 1 1 [4 rows x 4 columns]

方法＃2：as_index=Falseを使用して、そもそもインデックスを作成しないでください

>>> g = dfalph[['token', 'year', 'uses', 'books']].groupby(['token', 'year'], as_index=False).sum() >>> g token year uses books 0 xanthos 1830 3 3 1 xanthos 1840 3 3 2 xanthos 1868 2 2 3 xanthos 1875 1 1 [4 rows x 4 columns]

Adarsh Madrecha · Answer

私は受け入れられた答えを延期します。これを行うには2つの方法がありますが、これらは必ずしも同じ出力になるとは限りません。特にGrouperでgroupbyを使用している場合

_index=False_
reset_index()

例df

_+---------+---------+-------------+------------+ | column1 | column2 | column_date | column_sum | +---------+---------+-------------+------------+ | A | M | 26-10-2018 | 2 | | B | M | 28-10-2018 | 3 | | A | M | 30-10-2018 | 6 | | B | M | 01-11-2018 | 3 | | C | N | 03-11-2018 | 4 | +---------+---------+-------------+------------+ _

同じようには機能しません。

_df = df.groupby( by=[ 'column1', 'column2', pd.Grouper(key='column_date', freq='M') ], as_index=False ).sum() _

上記は与えます

_+---------+---------+------------+ | column1 | column2 | column_sum | +---------+---------+------------+ | A | M | 8 | | B | M | 3 | | B | M | 3 | | C | N | 4 | +---------+---------+------------+ _

一方、

_df = df.groupby( by=[ 'column1', 'column2', pd.Grouper(key='column_date', freq='M') ] ).sum().reset_index() _

あげる

_+---------+---------+-------------+------------+ | column1 | column2 | column_date | column_sum | +---------+---------+-------------+------------+ | A | M | 31-10-2018 | 8 | | B | M | 31-10-2018 | 3 | | B | M | 30-11-2018 | 3 | | C | N | 30-11-2018 | 4 | +---------+---------+-------------+------------+ _

user1809802 · Answer

drop=Trueを追加する必要があります：

df.reset_index(drop=True) df = df.groupby( by=[ 'column1', 'column2', pd.Grouper(key='column_date', freq='M') ] ).sum().reset_index(drop=True)