グループごとにRで組み合わせを作成する

Question

4人の生徒の可能なすべてのグループの私の教室のリストを作成したいと思います。 20人の生徒がいる場合、Rでグループごとにこれをどのように作成できますか。ここで、行は各組み合わせであり、生徒IDの完全なリストには20列があり、列1〜4は「group1」、5〜9は「group2」など.

以下は、4人の学生の各単一グループ（x1、x2、x3、およびx4）の可能な組み合わせのリストです。ここで、リストされている各行について、4人の学生の他の4つのグループの可能性は何ですか？したがって、20列あるはずです（Group1_1：4、Group2_1：4、Group3_1：4、Group4_1：4、Group5_1：4）。

combn(c(1:20), m = 4)

望ましい出力

Combination 1 = Group1[1, 2, 3, 4] Group2[5, 6, 7, 8], Group3[9, 10, 11, 12], etc. Combination 2 = Group1[1, 2, 3, 5]... etc.

組み合わせに関する投稿はたくさんありますが、すでに回答があり、見つけられなかった可能性があります。どんな助けでもありがたいです！

Cole · Answer

これはこの答えに大きく依存しています：

すべての組み合わせとそれらの組み合わせのすべてのグループを作成できるアルゴリズム

注意すべき点の1つは、答えが動的ではないことです。3つのグループのソリューションのみが含まれています。より堅牢にするために、入力パラメーターに基づいてコードを作成できます。つまり、次の再帰関数がオンザフライでグループ3に作成されます。

group <- function(input, step){ len <- length(input) combination[1, step] <<- input[1] for (i1 in 2:(len-1)) { combination[2, step] <<- input[i1] for (i2 in (i1+1):(len-0)) { combination[3, step] <<- input[i2] if (step == m) { print(z); result[z, ,] <<- combination z <<- z+1 } else { rest <- setdiff(input, input[c(i1,i2, 1)]) group(rest, step +1) #recursive if there are still additional possibilities }} } }

N = 16とk = 4の実行には、約55秒かかります。 Rcppに翻訳したいのですが、残念ながらそのスキルセットはありません。

group_N <- function(input, k = 2) { N = length(input) m = N/k combos <- factorial(N) / (factorial(k)^m * factorial(m)) result <- array(NA_integer_, dim = c(combos, m, k)) combination = matrix(NA_integer_, nrow = k, ncol = m) z = 1 group_f_start = 'group <- function(input, step){
 len <- length(input) 
 combination[1, step] <<- input[1] 
 ' i_s <- paste0('i', seq_len(k-1)) group_f_fors = paste0('for (', i_s, ' in ', c('2', if (length(i_s) != 1) {paste0('(', i_s[-length(i_s)], '+1)')}), ':(len-', rev(seq_len(k)[-k])-1, ')) { 
 combination[', seq_len(k)[-1], ', step] <<- input[', i_s, '] 
', collapse = '
 ') group_f_inner = paste0('if (step == m) { 
 result[z, ,] <<- combination 
 z <<- z+1 
 } else { 
 rest <- setdiff(input, input[c(', paste0(i_s, collapse = ','), ', 1)]) 
 group(rest, step +1) 
 }') eval(parse(text = paste0(group_f_start, group_f_fors, group_f_inner, paste0(rep('}', times = k), collapse = ' 
 ')))) group(input, 1) return(result) }

パフォーマンス

system.time({test_1 <- group_N(seq_len(4), 2)}) # user system elapsed # 0.01 0.00 0.02 library(data.table) #this funky step is just to better show the groups. the provided ## array is fine. as.data.table(t(rbindlist(as.data.table(apply(test_1, c(1,3), list))))) # V1 V2 #1: 1,2 3,4 #2: 1,3 2,4 #3: 1,4 2,3 system.time({test_1 <- group_N(seq_len(16), 4)}) # user system elapsed # 55.00 0.19 55.29 as.data.table(t(rbindlist(as.data.table(apply(test_1, c(1,3), list))))) #very slow # V1 V2 V3 V4 # 1: 1,2,3,4 5,6,7,8 9,10,11,12 13,14,15,16 # 2: 1,2,3,4 5,6,7,8 9,10,11,13 12,14,15,16 # 3: 1,2,3,4 5,6,7,8 9,10,11,14 12,13,15,16 # 4: 1,2,3,4 5,6,7,8 9,10,11,15 12,13,14,16 # 5: 1,2,3,4 5,6,7,8 9,10,11,16 12,13,14,15 # --- #2627621: 1,14,15,16 2,11,12,13 3, 6, 9,10 4,5,7,8 #2627622: 1,14,15,16 2,11,12,13 3,7,8,9 4, 5, 6,10 #2627623: 1,14,15,16 2,11,12,13 3, 7, 8,10 4,5,6,9 #2627624: 1,14,15,16 2,11,12,13 3, 7, 9,10 4,5,6,8 #2627625: 1,14,15,16 2,11,12,13 3, 8, 9,10 4,5,6,7

Joseph Wood · Answer

~~現在、これはRcppAlgosの開発バージョンに実装されており、次の公式リリースで [〜＃〜] cran [〜＃〜] に含まれる予定です。~~。これは正式に製品版のRcppAlgosとは別になりました^*。

_library(RcppAlgos) a <- comboGroups(10, numGroups = 2, retType = "3Darray") dim(a) [1] 126 5 2 a[1,,] Grp1 Grp2 [1,] 1 6 [2,] 2 7 [3,] 3 8 [4,] 4 9 [5,] 5 10 a[126,,] Grp1 Grp2 [1,] 1 2 [2,] 7 3 [3,] 8 4 [4,] 9 5 [5,] 10 6 _

または、行列を好む場合：

_a1 <- comboGroups(10, 2, retType = "matrix") head(a1) Grp1 Grp1 Grp1 Grp1 Grp1 Grp2 Grp2 Grp2 Grp2 Grp2 [1,] 1 2 3 4 5 6 7 8 9 10 [2,] 1 2 3 4 6 5 7 8 9 10 [3,] 1 2 3 4 7 5 6 8 9 10 [4,] 1 2 3 4 8 5 6 7 9 10 [5,] 1 2 3 4 9 5 6 7 8 10 [6,] 1 2 3 4 10 5 6 7 8 9 _

それも本当に速いです。 nThreadsまたは_Parallel = TRUE_（後者は1からシステムの最大スレッドを差し引いたものを使用します）と並行して生成して、効率をさらに向上させることもできます。

_comboGroupsCount(16, 4) [1] 2627625 system.time(comboGroups(16, 4, "matrix")) user system elapsed 0.107 0.030 0.137 system.time(comboGroups(16, 4, "matrix", nThreads = 4)) user system elapsed 0.124 0.067 0.055 ## 7 threads on my machine system.time(comboGroups(16, 4, "matrix", Parallel = TRUE)) user system elapsed 0.142 0.126 0.047 _

本当に素晴らしい機能は、特に結果の数が多い場合に、サンプルまたは特定の辞書編集の組み合わせグループを生成する機能です。

_comboGroupsCount(factor(state.abb), numGroups = 10) Big Integer ('bigz') : [1] 13536281554808237495608549953475109376 mySamp <- comboGroupsSample(factor(state.abb), numGroups = 10, "3Darray", n = 5, seed = 42) mySamp[1,,] Grp1 Grp2 Grp3 Grp4 Grp5 Grp`6 Grp7 Grp8 Grp9 Grp10 [1,] AL AK AR CA CO CT DE FL LA MD [2,] IA AZ ME ID GA OR IL IN MS NM [3,] KY ND MO MI HI PA MN KS MT OH [4,] TX RI SC NH NV WI NE MA NY TN [5,] VA VT UT OK NJ WY WA NC SD WV 50 Levels: AK AL AR AZ CA CO CT DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH ... WY firstAndLast <- comboGroupsSample(state.abb, 10, "3Darray", sampleVec = c("1", "13536281554808237495608549953475109376")) firstAndLast[1,,] Grp1 Grp2 Grp3 Grp4 Grp5 Grp6 Grp7 Grp8 Grp9 Grp10 [1,] "AL" "CO" "HI" "KS" "MA" "MT" "NM" "OK" "SD" "VA" [2,] "AK" "CT" "ID" "KY" "MI" "NE" "NY" "OR" "TN" "WA" [3,] "AZ" "DE" "IL" "LA" "MN" "NV" "NC" "PA" "TX" "WV" [4,] "AR" "FL" "IN" "ME" "MS" "NH" "ND" "RI" "UT" "WI" [5,] "CA" "GA" "IA" "MD" "MO" "NJ" "OH" "SC" "VT" "WY" firstAndLast[2,,] Grp1 Grp2 Grp3 Grp4 Grp5 Grp6 Grp7 Grp8 Grp9 Grp10 [1,] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "FL" "GA" [2,] "WA" "TX" "RI" "OH" "NM" "NE" "MN" "ME" "IA" "HI" [3,] "WV" "UT" "SC" "OK" "NY" "NV" "MS" "MD" "KS" "ID" [4,] "WI" "VT" "SD" "OR" "NC" "NH" "MO" "MA" "KY" "IL" [5,] "WY" "VA" "TN" "PA" "ND" "NJ" "MT" "MI" "LA" "IN" _

そして最後に、20人のすべての_2,546,168,625_組み合わせグループを5つのグループ（OPが要求するもの）に生成するには、lowerおよびupper引数を使用して1分以内に達成できます。

_system.time(aPar <- parallel::mclapply(seq(1, 2546168625, 969969), function(x) { combs <- comboGroups(20, 5, "3Darray", lower = x, upper = x + 969968) ### do something dim(combs) }, mc.cores = 6)) user system elapsed 217.667 22.932 48.482 sum(sapply(aPar, "[", 1)) [1] 2546168625 _

私はこの問題に取り組み始めました 1年以上前、この質問は、これをパッケージに形式化するための大きなインスピレーションでした。

^* 私はRcppAlgosの作成者です

Jon Spring · Answer

列挙する可能性は25億に及ぶと私は考えているので、これは計算上困難な問題です。（それが間違っている場合は、このアプローチがどこでうまくいかないかについての洞察を歓迎します。）

それがどのように格納されているかによって、これらすべてのグループを含むテーブルには、ほとんどのコンピューターが処理できるよりも多くのRAM=が必要になる場合があります。それを作成する効率的な方法を見ると感心します。 "一度に1つの組み合わせを作成する」というアプローチでは、1秒あたり1,000,000を生成できる場合はすべての可能性を生成するのに41分かかり、1秒あたり1,000しか生成できない場合は1か月かかります。

編集-＃1から＃2,546,168,625までの任意のグループを作成するために下部に部分的な実装を追加しました。いくつかの目的のために、これは実際に非常に大きいシーケンス全体を格納するのとほぼ同じくらい良いかもしれません。

たとえば、グループA、B、C、D、Eの4人の学生からなる5つのグループを作成するとします。

グループAをStudent＃1が所属するグループとして定義してみましょう。他の19人の学生のうち3人とペアにすることができます。私は他の学生のそのような組み合わせが969あると信じています：

_> nrow(t(combn(1:19, 3))) [1] 969 _

現在、16人の学生が他のグループに残っています。まだグループAにいない最初の生徒をグループBに割り当てましょう。それは生徒2、3、4、または5かもしれません。それは重要ではありません。私たちが知る必要があるのは、その生徒とペアにできる生徒は15人だけであることです。このような組み合わせは455あります。

_> nrow(t(combn(1:15, 3))) [1] 455 _

現在、12人の生徒が残っています。再び、グループ化されていない最初の学生をグループCに割り当てます。残りの11人の学生との組み合わせは165個残っています。

_> nrow(t(combn(1:11, 3))) [1] 165 _

そして、残り8人の学生がいます。そのうち7人は、グループ化されていない最初の学生と35通りの方法でグループDにペアリングできます。

_> nrow(t(combn(1:7, 3))) [1] 35 _

そして、他のグループが決定したら、残りの4人の学生のグループは1つだけです。そのうちの3人は、グループ化されていない最初の学生とペアにすることができます。

_> nrow(t(combn(1:3, 3))) [1] 1 _

これは、2.546Bの組み合わせを意味します。

_> 969*455*165*35*1 [1] 2546168625 _

以下は、任意のシーケンス番号に基づいてグループ化を行う進行中の関数です。

1）[進行中]シーケンス番号をベクターに変換し、グループA、B、C、D、およびEに使用する＃組み合わせを記述します。たとえば、これは＃1をc(1, 1, 1, 1, 1)および＃に変換する必要があります2,546,168,625からc(969, 455, 165, 35, 1)に。

2）組み合わせを、各グループの学生を説明する特定の出力に変換します。

_groupings <- function(seq_nums) { students <- 20 group_size = 4 grouped <- NULL remaining <- 1:20 seq_nums_pad <- c(seq_nums, 1) # Last group always uses the only possible combination for (g in 1:5) { group_relative <- c(1, 1 + t(combn(1:(length(remaining) - 1), group_size - 1))[seq_nums_pad[g], ]) group <- remaining[group_relative] print(group) grouped = c(grouped, group) remaining <- setdiff(remaining, grouped) } } > groupings(c(1,1,1,1)) #[1] 1 2 3 4 #[1] 5 6 7 8 #[1] 9 10 11 12 #[1] 13 14 15 16 #[1] 17 18 19 20 > groupings(c(1,1,1,2)) #[1] 1 2 3 4 #[1] 5 6 7 8 #[1] 9 10 11 12 #[1] 13 14 15 17 #[1] 16 18 19 20 > groupings(c(969, 455, 165, 35)) # This one uses the last possibility for #[1] 1 18 19 20 # each grouping. #[1] 2 15 16 17 #[1] 3 12 13 14 #[1] 4 9 10 11 #[1] 5 6 7 8 _

d.b · Answer

小さい数値の例を次に示します。これは20人の生徒に十分対応できるとは思いません

total_students = 4 each_group = 2 total_groups = total_students/each_group if (total_students %% each_group == 0) { library(arrangements) group_id = rep(1:total_groups, each = each_group) #There is room to increase efficiency here by generating only relevant permutations temp = permutations(1:total_students, total_students) temp = unique(t(apply(temp, 1, function(i) { x = group_id[i] match(x, unique(x)) }))) dimnames(temp) = list(COMBO = paste0("C", 1:NROW(temp)), Student = paste0("S", 1:NCOL(temp))) } else { cat("Total students not multiple of each_group") temp = NA } #> Warning: package 'arrangements' was built under R version 3.5.3 temp #> Student #> COMBO S1 S2 S3 S4 #> C1 1 1 2 2 #> C2 1 2 1 2 #> C3 1 2 2 1

^{reprexパッケージ（v0.3.0）によって2019-09-02に作成されました}

可能なウェイの総数は、次の関数で与えられます（ここから）

foo = function(N, k) { #N is total number or people, k is number of people in each group if (N %% k == 0) { m = N/k factorial(N)/(factorial(k)^m * factorial(m)) } else { stop("N is not a multiple of n") } } foo(4, 2) #[1] 3 foo(20, 4) #[1] 2546168625

合計20人から4人のグループの場合、可能な配置の数は膨大です。

Monk · Answer

以下のこのコードは機能します。

# Create list of the 20 records list <- c(1:20) # Generate all combinations including repetitions c <- data.frame(expand.grid(rep(list(list), 4))); rm(list) c$combo <- paste(c$Var1, c$Var2, c$Var3, c$Var4) # Remove repetitions c <- subset(c, c$Var1 != c$Var2 & c$Var1 != c$Var3 & c$Var1 != c$Var4 & c$Var2 != c$Var3 & c$Var2 != c$Var4 & c$Var3 != c$Var4) # Create common group labels (ex. abc, acb, bac, bca, cab, cba would all have "abc" as their group label). key <- data.frame(paste(c$Var1, c$Var2, c$Var3, c$Var4)) key$group <- apply(key, 1, function(x) paste(sort(unlist(strsplit(x, " "))), collapse = " ")) c$group <- key$group; rm(key) # Sort by common group label and id combos by group c <- c[order(c$group),] c$Var1 <- NULL; c$Var2 <- NULL; c$Var3 <- NULL; c$Var4 <- NULL; c$rank <- rep(1:24) # Pivot c <- reshape(data=c,idvar="group", v.names = "combo", timevar = "rank", direction="wide")

Alejandro Andrade · Answer

したがって、データのベクトルを4回追加するだけで、_expand.grid_関数を使用してすべての組み合わせを取得できます。次に、結果にはc(1,1,1,1)のような組み合わせが含まれるため、重複する値を持つ各行を削除し、最後の部分で組み合わせを作成します。それは2ループであり、それはかなり遅いですが、あなたが望むものを得るでしょう。 Rcppパッケージでスピードアップすることができます。コードは次のとおりです。

_ids = 1:20 d2 = expand.grid(ids,ids,ids,ids) ## Remove rows with duplicated values pos_use = apply(apply(d2,1,duplicated),2,function(x) all(x == F)) d2_temp = t(apply(d2[pos_use,],1,sort)) list_temp = list() pos_quitar = NULL for(i in 1:nrow(d2_temp)){ pos_quitar = c(pos_quitar,i) ini_comb = d2_temp[i,] d2_temp_use = d2_temp[-pos_quitar,] temp_comb = ini_comb for(j in 2:5){ pos_quitar_new = which(apply(d2_temp_use,1,function(x) !any(temp_comb%in%x)))[1] temp_comb = c(temp_comb,d2_temp_use[pos_quitar_new,]) } pos_quitar = c(pos_quitar,pos_quitar_new) list_temp[[i]] = temp_comb } list_temp _