パンダ列の値を辞書でリマップ

Question

私はこのような辞書を持っています：di = {1: "A", 2: "B"}

これをデータフレームの "col1"列に適用したいと思います。

 col1 col2 0 w a 1 1 2 2 2 NaN

取得するため：

 col1 col2 0 w a 1 A 2 2 B NaN

どうすればこれをうまくやることができますか？どういうわけかこれに関するグーグル用語は私に辞書からコラムを作る方法についてのリンクだけを示しています、そしてその逆もあります： - /

DSM · Accepted Answer

.replace を使うことができます。例えば：

>>> df = pd.DataFrame({'col2': {0: 'a', 1: 2, 2: np.nan}, 'col1': {0: 'w', 1: 1, 2: 2}}) >>> di = {1: "A", 2: "B"} >>> df col1 col2 0 w a 1 1 2 2 2 NaN >>> df.replace({"col1": di}) col1 col2 0 w a 1 A 2 2 B NaN

または Series 、つまりdf["col1"].replace(di, inplace=True)に直接アクセスします。

JohnE · Answer

`map`は`replace`よりはるかに速い可能性があります

辞書に2つ以上のキーがある場合は、mapを使用するほうがreplaceよりはるかに高速になります。このアプローチには2つのバージョンがあります。辞書がすべての可能な値を徹底的にマップするかどうか（そしてまた、不一致がそれらの値を保持するかNaNに変換されるかどうか）によって異なります。

徹底的なマッピング

この場合、フォームは非常に単純です。

df['col1'].map(di) # note: if the dictionary does not exhaustively map all # entries then non-matched entries are changed to NaNs

mapは引数として関数をとるのが最も一般的ですが、代わりに辞書やシリーズをとることもできます： Pandas.series.mapのドキュメント

非徹底的なマッピング

完全ではないマッピングがあり、不一致のために既存の変数を保持したい場合は、fillnaを追加できます。

df['col1'].map(di).fillna(df['col1'])

ここで@ jppの答えのように：辞書を介して効率的にパンダシリーズの値を置き換えます

ベンチマーク

パンダバージョン0.23.1で以下のデータを使用する。

di = {1: "A", 2: "B", 3: "C", 4: "D", 5: "E", 6: "F", 7: "G", 8: "H" } df = pd.DataFrame({ 'col1': np.random.choice( range(1,9), 100000 ) })

%timeitでテストしたところ、mapはreplaceよりも約10倍高速です。

mapによるスピードアップは、データによって異なります。最大のスピードアップは大きな辞書と徹底的な置き換えにあるように見えます。より広範なベンチマークと議論については、@ jpp answer（上記リンク）を参照してください。

unutbu · Answer

あなたの質問には多少の曖昧さがあります。少なくともあります三二つの解釈：

di内のキーはインデックス値を参照します
diのキーはdf['col1']値を参照します
diのキーはインデックスの場所を参照します（OPの質問ではありませんが、楽しみのために投入されています）。

以下はそれぞれの場合の解決策です。

ケース1：diのキーがインデックス値を参照することを意図している場合は、updateメソッドを使用できます。

df['col1'].update(pd.Series(di))

例えば、

import pandas as pd import numpy as np df = pd.DataFrame({'col1':['w', 10, 20], 'col2': ['a', 30, np.nan]}, index=[1,2,0]) # col1 col2 # 1 w a # 2 10 30 # 0 20 NaN di = {0: "A", 2: "B"} # The value at the 0-index is mapped to 'A', the value at the 2-index is mapped to 'B' df['col1'].update(pd.Series(di)) print(df)

収量

 col1 col2 1 w a 2 B 30 0 A NaN

元の投稿の値を変更したので、updateが何をしているのかが明確になります。 di内のキーがインデックス値とどのように関連付けられているかに注意してください。インデックス値の順序、つまりindex locations は重要ではありません。

ケース2：di内のキーがdf['col1']値を参照している場合、@DanAllanと@DSMはreplaceでこれを実現する方法を示します。

import pandas as pd import numpy as np df = pd.DataFrame({'col1':['w', 10, 20], 'col2': ['a', 30, np.nan]}, index=[1,2,0]) print(df) # col1 col2 # 1 w a # 2 10 30 # 0 20 NaN di = {10: "A", 20: "B"} # The values 10 and 20 are replaced by 'A' and 'B' df['col1'].replace(di, inplace=True) print(df)

収量

 col1 col2 1 w a 2 A 30 0 B NaN

この場合、diのキーがdf['col1']の values に一致するように変更されたことに注意してください。

ケース3：di内のキーがインデックス位置を参照している場合は、次のようにします。

df['col1'].put(di.keys(), di.values())

以来

df = pd.DataFrame({'col1':['w', 10, 20], 'col2': ['a', 30, np.nan]}, index=[1,2,0]) di = {0: "A", 2: "B"} # The values at the 0 and 2 index locations are replaced by 'A' and 'B' df['col1'].put(di.keys(), di.values()) print(df)

収量

 col1 col2 1 A a 2 10 30 0 B NaN

ここでは、diのキーが0と2であるため、1行目と3行目が変更されました。これらのキーは、Pythonの0から始まるインデックスでは1行目と3行目を参照します。

Nico Coallier · Answer

データデータフレームに再マップする列が複数ある場合は、この質問に追加してください。

def remap(data,dict_labels): """ This function take in a dictionnary of labels : dict_labels and replace the values (previously labelencode) into the string. ex: dict_labels = {{'col1':{1:'A',2:'B'}} """ for field,values in dict_labels.items(): print("I am remapping %s"%field) data.replace({field:values},inplace=True) print("DONE") return data

それが誰かに役立つことを願っています。

乾杯

wordsforthewise · Answer

DSMには受け入れられた答えがありますが、コーディングは誰にとってもうまくいかないようです。これは現在のバージョンのパンダ（2018年8月現在で0.23.4）で動作するものです。

import pandas as pd df = pd.DataFrame({'col1': [1, 2, 2, 3, 1], 'col2': ['negative', 'positive', 'neutral', 'neutral', 'positive']}) conversion_dict = {'negative': -1, 'neutral': 0, 'positive': 1} df['converted_column'] = df['col2'].replace(conversion_dict) print(df.head())

あなたはそれがこんな風に見えるのを見るでしょう：

 col1 col2 converted_column 0 1 negative -1 1 2 positive 1 2 2 neutral 0 3 3 neutral 0 4 1 positive 1

pandas.DataFrame.replaceのドキュメントはこちらです。

U9-Forward · Answer

あるいはapply：

df['col1'].apply(lambda x: {1: "A", 2: "B"}.get(x,x))

デモ：

>>> df['col1']=df['col1'].apply(lambda x: {1: "A", 2: "B"}.get(x,x)) >>> df col1 col2 0 w a 1 1 2 2 2 NaN >>>

Amirhos Imani · Answer

よりネイティブなパンダのアプローチは、以下のように置換機能を適用することです。

def multiple_replace(dict, text): # Create a regular expression from the dictionary keys regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys()))) # For each match, look-up corresponding value in dictionary return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)

関数を定義したら、それをデータフレームに適用できます。

di = {1: "A", 2: "B"} df['col1'] = df.apply(lambda row: multiple_replace(di, row['col1']), axis=1)

dorien · Answer

クラスラベルのマップを保持する素晴らしい完全な解決策：

labels = features['col1'].unique() labels_dict = dict(Zip(labels, range(len(labels)))) features = features.replace({"col1": labels_dict})

これにより、いつでもlabels_dictから元のクラスラベルを参照することができます。

パンダ列の値を辞書でリマップ

mapはreplaceよりはるかに速い可能性があります

徹底的なマッピング

非徹底的なマッピング

ベンチマーク

`map`は`replace`よりはるかに速い可能性があります