dplyrを使用してRデータフレームの2つの行の値の違いを見つける方法

Question

私は次のようなRデータフレームを持っています：

df <- data.frame(period=rep(1:4,2), farm=c(rep('A',4),rep('B',4)), cumVol=c(1,5,15,31,10,12,16,24), other = 1:8); period farm cumVol other 1 1 A 1 1 2 2 A 5 2 3 3 A 15 3 4 4 A 31 4 5 1 B 10 5 6 2 B 12 6 7 3 B 16 7 8 4 B 24 8

「その他」の列を無視して、各期間の各ファームでのcumVolの変化をどのように見つけますか？このようなデータフレームが必要です（オプションでcumVol列が残っています）：

 period farm volume other 1 1 A 0 1 2 2 A 4 2 3 3 A 10 3 4 4 A 16 4 5 1 B 0 5 6 2 B 2 6 7 3 B 4 7 8 4 B 8 8

実際には、多くの「農場」のような列、および多くの「その他」のような（つまり無視された）列がある場合があります。変数を使用してすべての列名を指定できるようにしたいと思います。

Dplyrパッケージを使用しています。

Vincent · Accepted Answer

Dplyrで：

require(dplyr) df %>% group_by(farm) %>% mutate(volume = cumVol - lag(cumVol, default = cumVol[1])) Source: local data frame [8 x 5] Groups: farm period farm cumVol other volume 1 1 A 1 1 0 2 2 A 5 2 4 3 3 A 15 3 10 4 4 A 31 4 16 5 1 B 10 5 0 6 2 B 12 6 2 7 3 B 16 7 4 8 4 B 24 8 8

おそらく、望ましい出力は実際には次のようになりますか？

df %>% group_by(farm) %>% mutate(volume = cumVol - lag(cumVol, default = 0)) period farm cumVol other volume 1 1 A 1 1 1 2 2 A 5 2 4 3 3 A 15 3 10 4 4 A 31 4 16 5 1 B 10 5 10 6 2 B 12 6 2 7 3 B 16 7 4 8 4 B 24 8 8

編集：あなたのコメントのフォローアップ私はあなたがarrange（）を探していると思います。それはそうではありません。それは新しい質問を始めるのが最善であるかもしれません。

df1 <- data.frame(period=rep(1:4,4), farm=rep(c(rep('A',4),rep('B',4)),2), crop=(c(rep('Apple',8), rep('pear',8))), cumCropVol=c(1,5,15,31,10,12,16,24,11,15,25,31,20,22,26,34), other = rep(1:8,2) ); df1 %>% arrange(desc(period), desc(farm)) %>% group_by(period, farm) %>% summarise(cumVol=sum(cumCropVol))

編集：フォローアップ＃2

df1 <- data.frame(period=rep(1:4,4), farm=rep(c(rep('A',4),rep('B',4)),2), crop=(c(rep('Apple',8), rep('pear',8))), cumCropVol=c(1,5,15,31,10,12,16,24,11,15,25,31,20,22,26,34), other = rep(1:8,2) ); df <- df1 %>% arrange(desc(period), desc(farm)) %>% group_by(period, farm) %>% summarise(cumVol=sum(cumCropVol)) ungroup(df) %>% arrange(farm) %>% group_by(farm) %>% mutate(volume = cumVol - lag(cumVol, default = 0)) Source: local data frame [8 x 4] Groups: farm period farm cumVol volume 1 1 A 12 12 2 2 A 20 8 3 3 A 40 20 4 4 A 62 22 5 1 B 30 30 6 2 B 34 4 7 3 B 42 8 8 4 B 58 16

Tim Cameron · Answer

Dplyrで-NAを置き換える必要はありません

library(dplyr) df %>% group_by(farm)%>% mutate(volume = c(0,diff(cumVol))) period farm cumVol other volume 1 1 A 1 1 0 2 2 A 5 2 4 3 3 A 15 3 10 4 4 A 31 4 16 5 1 B 10 5 0 6 2 B 12 6 2 7 3 B 16 7 4 8 4 B 24 8 8

marbel · Answer

元のデータセットに新しい列を作成することはオプションでしょうか？

以下は、data.table演算子:=を使用したオプションです。

require("data.table") DT <- data.table(df) DT[, volume := c(0,diff(cumVol)), by="farm"]

または

diff_2 <- function(x) c(0,diff(x)) DT[, volume := diff_2(cumVol), by="farm"]

出力：

# > DT # period farm cumVol other volume # 1: 1 A 1 1 0 # 2: 2 A 5 2 4 # 3: 3 A 15 3 10 # 4: 4 A 31 4 16 # 5: 1 B 10 5 0 # 6: 2 B 12 6 2 # 7: 3 B 16 7 4 # 8: 4 B 24 8 8

Jilber Urbina · Answer

tapplyおよびtransform？

> transform(df, volumen=unlist(tapply(cumVol, farm, function(x) c(0, diff(x))))) period farm cumVol other volumen A1 1 A 1 1 0 A2 2 A 5 2 4 A3 3 A 15 3 10 A4 4 A 31 4 16 B1 1 B 10 5 0 B2 2 B 12 6 2 B3 3 B 16 7 4 B4 4 B 24 8 8

aveはより良いオプションです。@ thelatemailのコメントを参照してください

with(df, ave(cumVol,farm,FUN=function(x) c(0,diff(x))) )