web-dev-qa-db-ja.com

pandasデータフレームへのピボットテーブル

次のようなデータフレーム(df)があります。

+---------+-------+------------+----------+
| subject | pills |    date    | strength |
+---------+-------+------------+----------+
|       1 |     4 | 10/10/2012 |      250 |
|       1 |     4 | 10/11/2012 |      250 |
|       1 |     2 | 10/12/2012 |      500 |
|       2 |     1 | 1/6/2014   |     1000 |
|       2 |     1 | 1/7/2014   |      250 |
|       2 |     1 | 1/7/2014   |      500 |
|       2 |     3 | 1/8/2014   |      250 |
+---------+-------+------------+----------+

Rでreshapeを使用すると、必要なものが得られます。

reshape(df, idvar = c("subject","date"), timevar = 'strength', direction = "wide")

+---------+------------+--------------+--------------+---------------+
| subject |    date    | strength.250 | strength.500 | strength.1000 |
+---------+------------+--------------+--------------+---------------+
|       1 | 10/10/2012 | 4            | NA           | NA            |
|       1 | 10/11/2012 | 4            | NA           | NA            |
|       1 | 10/12/2012 | NA           | 2            | NA            |
|       2 | 1/6/2014   | NA           | NA           | 1             |
|       2 | 1/7/2014   | 1            | 1            | NA            |
|       2 | 1/8/2014   | 3            | NA           | NA            |
+---------+------------+--------------+--------------+---------------+

パンダの使用:

df.pivot_table(df, index=['subject','date'],columns='strength')

+---------+------------+-------+----+-----+
|         |            | pills            |
+---------+------------+-------+----+-----+
|         | strength   | 250   | 500| 1000|
+---------+------------+-------+----+-----+
| subject | date       |       |    |     |
+---------+------------+-------+----+-----+
| 1       | 10/10/2012 | 4     | NA | NA  |
|         | 10/11/2012 | 4     | NA | NA  |
|         | 10/12/2012 | NA    | 2  | NA  |
+---------+------------+-------+----+-----+
| 2       | 1/6/2014   | NA    | NA | 1   |
|         | 1/7/2014   | 1     | 1  | NA  |
|         | 1/8/2014   | 3     | NA | NA  |
+---------+------------+-------+----+-----+

パンダでRとまったく同じ出力を取得するにはどうすればよいですか?ヘッダーが1つだけ必要です。

11
alma123

ピボット後、データフレームをレコードに変換し、データフレームに戻します。

flattened = pd.DataFrame(pivoted.to_records())
#   subject        date  ('pills', 250)  ('pills', 500)  ('pills', 1000)
#0        1  10/10/2012             4.0             NaN              NaN
#1        1  10/11/2012             4.0             NaN              NaN
#2        1  10/12/2012             NaN             2.0              NaN
#3        2    1/6/2014             NaN             NaN              1.0
#4        2    1/7/2014             1.0             1.0              NaN
#5        2    1/8/2014             3.0             NaN              NaN

必要に応じて、列名を「修復」できるようになりました。

flattened.columns = [hdr.replace("('pills', ", "strength.").replace(")", "") \
                     for hdr in flattened.columns]
flattened
#   subject        date  strength.250  strength.500  strength.1000
#0        1  10/10/2012           4.0           NaN            NaN
#1        1  10/11/2012           4.0           NaN            NaN
#2        1  10/12/2012           NaN           2.0            NaN
#3        2    1/6/2014           NaN           NaN            1.0
#4        2    1/7/2014           1.0           1.0            NaN
#5        2    1/8/2014           3.0           NaN            NaN

それは厄介ですが、動作します。

38
DYZ