web-dev-qa-db-ja.com

`contrasts <-`(` * tmp * `、value = contr.funs [1 + isOF [nn]])のエラー:対比は2つ以上のレベルの因子にのみ適用できます

Rキャレットパッケージを使用してモデルを生成しています。次元削減の前処理でPCAを使用してから、ロジスティック回帰モデルを生成しようとしています。

このエラーが発生します:

contrasts<-*tmp*、value = contr.funs [1 + isOF [nn]])のエラー:対比は、2つ以上のレベルの因子にのみ適用できます

    credit <- read.csv('~Loans Question/RequiredAttributesWithLoanStatus.csv')

    credit$LoanStatus <- as.factor(credit$LoanStatus)

    str(credit)
    'data.frame':   8580 obs. of  45 variables:
     $ ListingCategory            : int  1 7 3 1 1 7 1 1 1 1 ...
     $ IncomeRange                : int  3 4 6 4 4 3 3 4 3 3 ...
     $ StatedMonthlyIncome        : num  2583 4326 10500 4167 5667 ...
     $ IncomeVerifiable           : logi  TRUE TRUE TRUE FALSE TRUE TRUE ...
     $ DTIwProsperLoan            : num  1.8e-01 2.0e-01 1.7e-01 1.0e+06 1.8e-01 4.4e-01 2.2e-01 2.0e-01 2.0e-01 3.1e-01 ...
     $ EmploymentStatusDescription: Factor w/ 7 levels "Employed","Full-time",..: 1 4 1 7 1 1 1 1 1 1 ...
     $ Occupation                 : Factor w/ 65 levels "","Accountant/CPA",..: 37 37 20 14 43 58 48 37 37 37 ...
     $ MonthsEmployed             : int  4 44 159 67 26 16 209 147 24 9 ...
     $ BorrowerState              : Factor w/ 48 levels "AK","AL","AR",..: 22 32 5 5 14 28 4 10 10 34 ...
     $ BorrowerCity               : Factor w/ 3089 levels "AARONSBURG","ABERDEEN",..: 1737 3059 2488 654 482 719 895 1699 2747 1903 ...
     $ BorrowerMetropolitanArea   : Factor w/ 1 level "(Not Implemented)": 1 1 1 1 1 1 1 1 1 1 ...
     $ LenderIndicator            : int  0 0 0 1 0 0 0 0 1 0 ...
     $ GroupIndicator             : logi  FALSE FALSE FALSE TRUE FALSE FALSE ...
     $ GroupName                  : Factor w/ 83 levels "","00 Used Car Loans",..: 1 1 1 47 1 1 1 1 1 1 ...
     $ ChannelCode                : int  90000 90000 90000 80000 40000 40000 90000 90000 80000 90000 ...
     $ AmountParticipation        : int  0 0 0 0 0 0 0 0 0 0 ...
     $ MonthlyDebt                : int  247 785 1631 817 644 1524 427 817 654 749 ...
     $ CurrentDelinquencies       : int  0 0 0 0 0 0 0 1 0 1 ...
     $ DelinquenciesLast7Years    : int  0 10 0 0 0 0 0 0 0 0 ...
     $ PublicRecordsLast10Years   : int  0 1 0 0 0 0 1 0 1 0 ...
     $ PublicRecordsLast12Months  : int  0 0 0 0 0 0 0 0 0 0 ...
     $ FirstRecordedCreditLine    : Factor w/ 4719 levels "1/1/00 0:00",..: 3032 2673 1197 2541 4698 4345 3150 925 4452 2358 ...
     $ CreditLinesLast7Years      : int  53 30 36 26 7 22 15 20 34 32 ...
     $ InquiriesLast6Months       : int  2 8 5 0 0 0 0 3 0 0 ...
     $ AmountDelinquent           : int  0 0 0 0 0 0 0 63 0 15 ...
     $ CurrentCreditLines         : int  10 10 18 10 4 11 6 10 7 8 ...
     $ OpenCreditLines            : int  9 10 15 8 3 8 5 7 7 8 ...
     $ BankcardUtilization        : num  0.26 0.69 0.94 0.69 0.81 0.38 0.55 0.24 0.03 0 ...
     $ TotalOpenRevolvingAccounts : int  9 7 12 10 3 5 4 5 4 6 ...
     $ InstallmentBalance         : int  48648 14827 0 0 0 30916 0 21619 41340 15447 ...
     $ RealEstateBalance          : int  0 0 577745 0 0 0 191296 0 0 126039 ...
     $ RevolvingBalance           : int  5265 9967 94966 50511 37871 22463 19550 2436 1223 3236 ...
     $ RealEstatePayment          : int  0 0 4159 0 0 0 1303 0 0 1279 ...
     $ RevolvingAvailablePercent  : int  78 52 36 45 18 61 44 74 96 76 ...
     $ TotalInquiries             : int  8 11 15 2 0 0 1 7 1 1 ...
     $ TotalTradeItems            : int  53 30 36 26 7 22 15 20 34 32 ...
     $ SatisfactoryAccounts       : int  52 23 36 26 7 19 15 18 34 29 ...
     $ NowDelinquentDerog         : int  0 0 0 0 0 0 0 1 0 1 ...
     $ WasDelinquentDerog         : int  1 7 0 0 0 3 0 1 0 2 ...
     $ OldestTradeOpenDate        : int  5092001 5011977 12011984 4272000 9081993 9122000 6161987 11181999 9191990 4132000 ...
     $ DelinquenciesOver30Days    : int  0 6 0 0 0 13 0 2 0 2 ...
     $ DelinquenciesOver60Days    : int  0 4 0 0 0 0 0 0 0 1 ...
     $ DelinquenciesOver90Days    : int  0 10 0 0 0 0 0 0 0 0 ...
     $ IsHomeowner                : logi  FALSE FALSE TRUE FALSE FALSE FALSE ...
     $ LoanStatus                 : Factor w/ 4 levels "1","2","3","4": 4 2 2 4 4 4 4 4 4 3 ...

    summary(credit)
    ListingCategory   IncomeRange    StatedMonthlyIncome IncomeVerifiable
     Min.   : 0.000   Min.   :1.000   Min.   :     0      Mode :logical   
     1st Qu.: 1.000   1st Qu.:3.000   1st Qu.:  3167      FALSE:784       
     Median : 2.000   Median :4.000   Median :  4750      TRUE :7796      
     Mean   : 4.997   Mean   :4.089   Mean   :  5755      NA's :0         
     3rd Qu.: 7.000   3rd Qu.:5.000   3rd Qu.:  7083                      
     Max.   :20.000   Max.   :7.000   Max.   :250000                      

     DTIwProsperLoan     EmploymentStatusDescription
     Min.   :      0.0   Employed     :7182         
     1st Qu.:      0.1   Full-time    : 416         
     Median :      0.2   Not employed : 122         
     Mean   :  91609.4   Other        : 475         
     3rd Qu.:      0.3   Part-time    :   7         
     Max.   :1000000.0   Retired      :  32         
                         Self-employed: 346         
                        Occupation   MonthsEmployed   BorrowerState 
     Other                   :2421   Min.   :-23.00   CA     :1056  
     Professional            :1040   1st Qu.: 26.00   FL     : 608  
     Computer Programmer     : 345   Median : 68.00   NY     : 574  
     Executive               : 334   Mean   : 97.44   TX     : 532  
     Administrative Assistant: 325   3rd Qu.:139.00   IL     : 443  
     Teacher                 : 301   Max.   :755.00   GA     : 343  
     (Other)                 :3814   NA's   :5        (Other):5024  
        BorrowerCity       BorrowerMetropolitanArea LenderIndicator  
     CHICAGO  : 121   (Not Implemented):8580        Min.   :0.00000  
     NEW YORK :  91                                 1st Qu.:0.00000  
     BROOKLYN :  88                                 Median :0.00000  
     HOUSTON  :  64                                 Mean   :0.09196  
     LAS VEGAS:  53                                 3rd Qu.:0.00000  
     ATLANTA  :  51                                 Max.   :1.00000  
     (Other)  :8112                                                  
     GroupIndicator                                     GroupName   
     Mode :logical                                           :8326  
     FALSE:8325      We do not accept new membership requests:  39  
     TRUE :255       BORROWERS - LARGEST GROUP               :  29  
     NA's :0         LendersClub                             :  17  
                     Debt Consolidators                      :  12  
                     Have Money - Will Bid                   :  10  
                     (Other)                                 : 147  
      ChannelCode    AmountParticipation  MonthlyDebt      CurrentDelinquencies
     Min.   :40000   Min.   :0           Min.   :    0.0   Min.   : 0.0000     
     1st Qu.:80000   1st Qu.:0           1st Qu.:  364.0   1st Qu.: 0.0000     
     Median :80000   Median :0           Median :  708.0   Median : 0.0000     
     Mean   :77196   Mean   :0           Mean   :  885.5   Mean   : 0.4119     
     3rd Qu.:90000   3rd Qu.:0           3rd Qu.: 1205.2   3rd Qu.: 0.0000     
     Max.   :90000   Max.   :0           Max.   :30213.0   Max.   :21.0000     

     DelinquenciesLast7Years PublicRecordsLast10Years PublicRecordsLast12Months
     Min.   : 0.000          Min.   : 0.0000          Min.   :0.00000          
     1st Qu.: 0.000          1st Qu.: 0.0000          1st Qu.:0.00000          
     Median : 0.000          Median : 0.0000          Median :0.00000          
     Mean   : 4.009          Mean   : 0.2809          Mean   :0.01364          
     3rd Qu.: 3.000          3rd Qu.: 0.0000          3rd Qu.:0.00000          
     Max.   :99.000          Max.   :11.0000          Max.   :4.00000          

     FirstRecordedCreditLine CreditLinesLast7Years InquiriesLast6Months
     12/1/93 0:00:  20       Min.   :  2.0         Min.   : 0.0000     
     3/1/95 0:00 :  19       1st Qu.: 16.0         1st Qu.: 0.0000     
     6/1/90 0:00 :  17       Median : 24.0         Median : 1.0000     
     6/1/89 0:00 :  16       Mean   : 26.1         Mean   : 0.9994     
     12/1/90 0:00:  15       3rd Qu.: 34.0         3rd Qu.: 1.0000     
     2/1/94 0:00 :  14       Max.   :115.0         Max.   :15.0000     
     (Other)     :8479                                                 
     AmountDelinquent CurrentCreditLines OpenCreditLines  BankcardUtilization
     Min.   :     0   Min.   : 0.000     Min.   : 0.000   Min.   :0.0000     
     1st Qu.:     0   1st Qu.: 5.000     1st Qu.: 5.000   1st Qu.:0.2500     
     Median :     0   Median : 9.000     Median : 8.000   Median :0.5400     
     Mean   :  1195   Mean   : 9.345     Mean   : 8.306   Mean   :0.5182     
     3rd Qu.:     0   3rd Qu.:12.000     3rd Qu.:11.000   3rd Qu.:0.7900     
     Max.   :179158   Max.   :54.000     Max.   :42.000   Max.   :2.2300     

     TotalOpenRevolvingAccounts InstallmentBalance RealEstateBalance
     Min.   : 0.000             Min.   :     0     Min.   :      0  
     1st Qu.: 3.000             1st Qu.:  3338     1st Qu.:      0  
     Median : 6.000             Median : 14453     Median :  26154  
     Mean   : 6.441             Mean   : 24900     Mean   : 109306  
     3rd Qu.: 9.000             3rd Qu.: 32238     3rd Qu.: 176542  
     Max.   :44.000             Max.   :739371     Max.   :1938421  
                                NA's   :328                         
     RevolvingBalance RealEstatePayment RevolvingAvailablePercent TotalInquiries 
     Min.   :     0   Min.   :    0.0   Min.   :  0.00            Min.   : 0.00  
     1st Qu.:  2799   1st Qu.:    0.0   1st Qu.: 29.00            1st Qu.: 2.00  
     Median :  8784   Median :  346.5   Median : 52.00            Median : 3.00  
     Mean   : 19555   Mean   :  830.5   Mean   : 51.46            Mean   : 3.91  
     3rd Qu.: 21110   3rd Qu.: 1382.2   3rd Qu.: 75.00            3rd Qu.: 5.00  
     Max.   :695648   Max.   :13651.0   Max.   :100.00            Max.   :36.00  

     TotalTradeItems SatisfactoryAccounts NowDelinquentDerog WasDelinquentDerog
     Min.   :  2.0   Min.   :  1.00       Min.   : 0.0000    Min.   : 0.000    
     1st Qu.: 16.0   1st Qu.: 14.00       1st Qu.: 0.0000    1st Qu.: 0.000    
     Median : 24.0   Median : 21.00       Median : 0.0000    Median : 1.000    
     Mean   : 26.1   Mean   : 23.34       Mean   : 0.4119    Mean   : 2.343    
     3rd Qu.: 34.0   3rd Qu.: 30.25       3rd Qu.: 0.0000    3rd Qu.: 3.000    
     Max.   :115.0   Max.   :113.00       Max.   :21.0000    Max.   :32.000    

     OldestTradeOpenDate DelinquenciesOver30Days DelinquenciesOver60Days
     Min.   : 1011957    Min.   : 0.000          Min.   : 0.000         
     1st Qu.: 4101996    1st Qu.: 0.000          1st Qu.: 0.000         
     Median : 7191993    Median : 1.000          Median : 0.000         
     Mean   : 6934230    Mean   : 4.332          Mean   : 1.908         
     3rd Qu.:10011990    3rd Qu.: 5.000          3rd Qu.: 2.000         
     Max.   :12312004    Max.   :99.000          Max.   :73.000         

     DelinquenciesOver90Days IsHomeowner     LoanStatus
     Min.   : 0.000          Mode :logical   1:1847    
     1st Qu.: 0.000          FALSE:4264      2:1262    
     Median : 0.000          TRUE :4316      3: 256    
     Mean   : 4.009          NA's :0         4:5215    
     3rd Qu.: 3.000                                    
     Max.   :99.000                                    

    try(na.fail(credit))

    glmFit <- train(LoanStatus~., credit, method = "glm", family=binomial, preProcess=c("pca"), 
        trControl = trainControl(method = "cv"))

contrasts<-*tmp*、value = contr.funs [1 + isOF [nn]])のエラー:対比は、2つ以上のレベルの因子にのみ適用できます

logregFit <- train(LoanStatus~., credit, method = "logreg", family=binomial, preProcess=c("pca"), 
    trControl = trainControl(method = "cv"))

contrasts<-*tmp*、value = contr.funs [1 + isOF [nn]])のエラー:対比は、2つ以上のレベルの因子にのみ適用できます

6
dbl001

エラーメッセージとデータセットの変数を見ると、変数BorrowerMetropolitanAreaのレベルは1つだけです(実際には、すべてのサンプルの値が同じである場合、予測値はまったくありません)。これが、PCAを使用してデータセットを前処理するときにcontrasts関数で問題を引き起こしていると思います。

変数trainを使用せずにデータセットでBorrowerMetropolitanArea関数を呼び出してみてください。

9
howaj