ConfusionMatrixのエラー。データと参照因子は同じレベル数でなければなりません

Question

Rキャレットでツリーモデルをトレーニングしました。私は今、混同マトリックスを生成し、次のエラーを取得しようとしています：

ConfusionMatrix.default（predictionsTree、testdata $ catgeory）のエラー：データと参照因子は同じレベル数でなければなりません

prob <- 0.5 #Specify class split singleSplit <- createDataPartition(modellingData2$category, p=prob, times=1, list=FALSE) cvControl <- trainControl(method="repeatedcv", number=10, repeats=5) traindata <- modellingData2[singleSplit,] testdata <- modellingData2[-singleSplit,] treeFit <- train(traindata$category~., data=traindata, trControl=cvControl, method="rpart", tuneLength=10) predictionsTree <- predict(treeFit, testdata) confusionMatrix(predictionsTree, testdata$catgeory)

混同マトリックスの生成時にエラーが発生します。レベルは両方のオブジェクトで同じです。何が問題なのかわかりません。それらの構造とレベルを以下に示します。それらは同じでなければなりません。それが私をクラックさせるので、どんな助けも大歓迎です！

> str(predictionsTree) Factor w/ 30 levels "16-Merchant Service Charge",..: 28 22 22 22 22 6 6 6 6 6 ... > str(testdata$category) Factor w/ 30 levels "16-Merchant Service Charge",..: 30 30 7 7 7 7 7 30 7 7 ... > levels(predictionsTree) [1] "16-Merchant Service Charge" "17-Unpaid Cheque Fee" "18-Gov. Stamp Duty" "Misc" "26-Standard Transfer Charge" [6] "29-Bank Giro Credit" "3-Cheques Debit" "32-Standing Order - Debit" "33-Inter Branch Payment" "34-International" [11] "35-Point of Sale" "39-Direct Debits Received" "4-Notified Bank Fees" "40-Cash Lodged" "42-International Receipts" [16] "46-Direct Debits Paid" "56-Credit Card Receipts" "57-Inter Branch" "58-Unpaid Items" "59-Inter Company Transfers" [21] "6-Notified Interest Credited" "61-Domestic" "64-Charge Refund" "66-Inter Company Transfers" "67-Suppliers" [26] "68-Payroll" "69-Domestic" "73-Credit Card Payments" "82-CHAPS Fee" "Uncategorised" > levels(testdata$category) [1] "16-Merchant Service Charge" "17-Unpaid Cheque Fee" "18-Gov. Stamp Duty" "Misc" "26-Standard Transfer Charge" [6] "29-Bank Giro Credit" "3-Cheques Debit" "32-Standing Order - Debit" "33-Inter Branch Payment" "34-International" [11] "35-Point of Sale" "39-Direct Debits Received" "4-Notified Bank Fees" "40-Cash Lodged" "42-International Receipts" [16] "46-Direct Debits Paid" "56-Credit Card Receipts" "57-Inter Branch" "58-Unpaid Items" "59-Inter Company Transfers" [21] "6-Notified Interest Credited" "61-Domestic" "64-Charge Refund" "66-Inter Company Transfers" "67-Suppliers" [26] "68-Payroll" "69-Domestic" "73-Credit Card Payments" "82-CHAPS Fee" "Uncategorised"

Mayk Tulio · Answer

使用してみてください：

confusionMatrix(table(Argument 1, Argument 2))

それは私のために働いた。

Red · Answer

モデルが特定の要因を予測していない可能性があります。 table()の代わりにconfusionMatrix()関数を使用して、それが問題かどうかを確認します。

aristotll · Answer

na.passオプションにna.actionを指定してみてください。

predictionsTree <- predict(treeFit, testdata,na.action = na.pass)

S. Think · Answer

それらをデータフレームに変更し、confusionMatrix関数で使用します。

pridicted <- factor(predict(treeFit, testdata)) real <- factor(testdata$catgeory) my_data1 <- data.frame(data = pridicted, type = "prediction") my_data2 <- data.frame(data = real, type = "real") my_data3 <- rbind(my_data1,my_data2) # Check if the levels are identical identical(levels(my_data3[my_data3$type == "prediction",1]) , levels(my_data3[my_data3$type == "real",1])) confusionMatrix(my_data3[my_data3$type == "prediction",1], my_data3[my_data3$type == "real",1], dnn = c("Prediction", "Reference"))

tino_ladino · Answer

すべての依存関係とともにパッケージをインストールしたことを確認してください。

install.packages('caret', dependencies = TRUE) confusionMatrix( table(prediction, true_value) )

EaswerC · Answer

Testdataに値が欠落している可能性があります。「predictionsTree <-predict（treeFit、testdata）」の前に次の行を追加して、NAを削除します。私は同じエラーを抱えていましたが、今ではうまくいきます。

testdata <- testdata[complete.cases(testdata),]

Sanjay Nandakumar · Answer

データにNAが含まれている場合、因子レベルと見なされる場合があるため、最初はこれらのNAを省略します

DF = na.omit(DF)

次に、モデルの適合が何らかの不正確なレベルを予測している場合は、テーブルを使用する方が適切です

confusionMatrix(table(Arg1, Arg2))

orange1 · Answer

実行中の長さの問題は、おそらくトレーニングセットにNAが存在することによるものです。完了していないケースを削除するか、欠損値がないように補完します。

Alicia · Answer

私は同じ問題を抱えていましたが、そのようにデータファイルを読み込んだ後に先に進み、変更しました。

data = na.omit(data)

ポインタをありがとう！