ルックアップテーブルに基づいてデータフレームの値を置き換える

Question

データフレームの値を置き換えるのに問題があります。別のテーブルに基づいて値を置き換えたいと思います。以下は、私がやろうとしていることの例です。

すべての行が顧客であり、すべての列が購入した動物であるテーブルがあります。このデータフレームをtableと呼びましょう。

> table # P1 P2 P3 # 1 cat lizard parrot # 2 lizard parrot cat # 3 parrot cat lizard

lookUpと呼ばれる参照するテーブルもあります。

> lookUp # pet class # 1 cat mammal # 2 lizard reptile # 3 parrot bird

私がやりたいのは、newのすべての値をtableのclass列に置き換える関数でlookUpという新しいテーブルを作成することです。 lapply関数を使用して自分でこれを試しましたが、次の警告が表示されました。

new <- as.data.frame(lapply(table, function(x) { gsub('.*', lookUp[match(x, lookUp$pet) ,2], x)}), stringsAsFactors = FALSE) Warning messages: 1: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) : argument 'replacement' has length > 1 and only the first element will be used 2: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) : argument 'replacement' has length > 1 and only the first element will be used 3: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) : argument 'replacement' has length > 1 and only the first element will be used

この作品を作る方法についてのアイデアはありますか？

docendo discimus · Accepted Answer

あなたはあなたの質問に悪いことではないアプローチを投稿しました。以下は、なじみのあるアプローチです。

new <- df # create a copy of df # using lapply, loop over columns and match values to the look up table. store in "new". new[] <- lapply(df, function(x) look$class[match(x, look$pet)])

より高速な代替アプローチは次のとおりです。

new <- df new[] <- look$class[match(unlist(df), look$pet)]

空の角かっこ（[]）両方の場合で、newの構造をそのまま保持します（data.frame）。

（私はdfの代わりにtableとlookの代わりにlookupを使用しています）

Thierry · Answer

別のオプションは、tidyrとdplyrの組み合わせです

library(dplyr) library(tidyr) table %>% gather(key = "pet") %>% left_join(lookup, by = "pet") %>% spread(key = pet, value = class)

MichaelChirico · Answer

2つの個別のdata.framesと一方から他方へ情報を持ち込もうとする場合、答えはmergeです。

Rには誰もが自分の好きなマージ方法があります。私のものはdata.table。

また、これを多くの列に行いたいので、列をループするよりもmeltとdcastの方が速くなります。それを再形成されたテーブルに一度適用してから、再形成します。

library(data.table) #the row names will be our ID variable for melting setDT(table, keep.rownames = TRUE) setDT(lookUp) #now melt, merge, recast # melting (reshape wide to long) table[ , melt(.SD, id.vars = 'rn') # merging ][lookup, new_value := i.class, on = c(value = 'pet') #reform back to original shape ][ , dcast(.SD, rn ~ variable, value.var = 'new_value')] # rn P1 P2 P3 # 1: 1 mammal reptile bird # 2: 2 reptile bird mammal # 3: 3 bird mammal reptile

dcast/meltビットが少し怖い場合は、列をループするだけのアプローチがあります。 dcast/meltは、単にこの問題のループを回避しています。

setDT(table) #don't need row names this time setDT(lookUp) sapply(names(table), #(or to whichever are the relevant columns) function(cc) table[lookUp, (cc) := #merge, replace #need to pass a _named_ vector to 'on', so use setNames i.class, on = setNames("pet", cc)])

zx8754 · Answer

名前付きベクトルを作成し、すべての列をループして一致させます。以下を参照してください。

# make lookup vector with names lookUp1 <- setNames(as.character(lookUp$class), lookUp$pet) lookUp1 # cat lizard parrot # "mammal" "reptile" "bird" # match on names get values from lookup vector res <- data.frame(lapply(df1, function(i) lookUp1[i])) # reset rownames rownames(res) <- NULL # res # P1 P2 P3 # 1 mammal reptile bird # 2 reptile bird mammal # 3 bird mammal reptile

データ

df1 <- read.table(text = " P1 P2 P3 1 cat lizard parrot 2 lizard parrot cat 3 parrot cat lizard", header = TRUE) lookUp <- read.table(text = " pet class 1 cat mammal 2 lizard reptile 3 parrot bird", header = TRUE)

dannit · Answer

答え上記 dplyrでこれを行う方法を示すと、質問に答えず、テーブルにはNAが入力されます。これはうまくいきましたが、より良い方法を示すコメントをお願いします：

# Add a customer column so that we can put things back in the right order table$customer = seq(nrow(table)) classTable <- table %>% # put in long format, naming column filled with P1, P2, P3 "petCount" gather(key="petCount", value="pet", -customer) %>% # add a new column based on the pet's class in data frame "lookup" left_join(lookup, by="pet") %>% # since you wanted to replace the values in "table" with their # "class", remove the pet column select(-pet) %>% # put data back into wide format spread(key="petCount", value="class")

顧客、ペット、ペットの種（？）およびそれらのクラスを含む長いテーブルを保持しておくと役立つ可能性が高いことに注意してください。この例では、変数に中間保存を追加するだけです。

table$customer = seq(nrow(table)) petClasses <- table %>% gather(key="petCount", value="pet", -customer) %>% left_join(lookup, by="pet") custPetClasses <- petClasses %>% select(-pet) %>% spread(key="petCount", value="class")

mikey · Answer

他のアプローチを試してみましたが、非常に大きなデータセットでは非常に長い時間がかかりました。代わりに次を使用しました。

_ # make table "new" using ifelse. See data below to avoid re-typing it new <- ifelse(table1 =="cat", "mammal", ifelse(table1 == "lizard", "reptile", ifelse(table1 =="parrot", "bird", NA))) _

この方法では、コード用により多くのテキストを記述する必要がありますが、ifelseのベクトル化により、実行速度が向上します。データに基づいて、コードの作成やコンピューターの実行の待機により多くの時間を費やすかどうかを決定する必要があります。確実に機能するようにしたい場合（ifleseコマンドにタイプミスがなかった場合）、apply(new, 2, function(x) mean(is.na(x)))を使用できます。

データ

_ # create the data table table1 <- read.table(text = " P1 P2 P3 1 cat lizard parrot 2 lizard parrot cat 3 parrot cat lizard", header = TRUE) _