0
votes

I am trying to predict on R some data using svm to create the model and predict

below I attaced some code I hope explain myself

datos<-read.csv("Seguros.csv",sep = ";",dec='.',header=T)
muestra<-sample(nrow(datos),4400)
aprendizaje<-datos[muestra,]
datosPrec <- read.csv("SegurosNuevosVE150.csv", sep = ";", dec = ".", header = T)
modeloSig <- svm(Fraude ~ ., data = aprendizaje, kernel = "sigmoid")
modeloSig
predictFinal <- predict(modeloSig, datosPrec[, 16])

and I get this error

Error in colnames<-(*tmp*, value = c("MontoPagado", "Interes", "Plazo", : length of 'dimnames' [2] not equal to array extent

The data on both is the same the only difference is the amount of rows and that the last colummn call "Fraude" instead of saying "Si" o "No" it says "NA" on SegurosNuevosVE150.csv below I attached the summary, I hoped I explained myself

summary (Seguros.csv)

MontoPagado Interes Plazo Tipo Mensualidad
Min. : -3453 Min. :-0.6448 Min. : 64.0 Internacional: 343 Min. : 5003
1st Qu.: 2284315 1st Qu.:17.0000 1st Qu.:404.0 Nacional :6070 1st Qu.: 12164
Median : 3831087 Median :17.2500 Median :444.3 Median : 17299
Mean : 4585558 Mean :15.8877 Mean :438.4 Mean : 23496
3rd Qu.: 5792869 3rd Qu.:17.7500 3rd Qu.:478.7 3rd Qu.: 28939
Max. :49019276 Max. :18.7500 Max. :515.3 Max. :276296

MontoAsegurado TipoPago ModoPago Fiador Fiador2 OtroSeguro Record
Min. : 7803 A:4158 Cajas : 607 No:3278 No:5527 No:5310 R0 :5310
1st Qu.: 401439 B:1817 NoDefinido: 450 Si:3135 Si: 886 Si:1103 R2 : 530
Median : 613561 C: 438 Tarjeta :5356 R1 : 495
Mean : 764561 R5 : 40
3rd Qu.: 916591 R6 : 26
Max. :7734773 R7 : 5
(Other): 7
Edad EstadoCivil Salario Fraude
Min. :21.00 Divorciado: 229 Min. : 0 No:6146
1st Qu.:33.50 NoAplica : 25 1st Qu.: 0 Si: 267
Median :35.50 Soltero :5895 Median : 0
Mean :36.58 Union : 256 Mean : 4126
3rd Qu.:38.50 Viudo : 8 3rd Qu.: 6404
Max. :57.00 Max. :104185

summary(SegurosNuevosVE150.csv)

MontoPagado Interes Plazo Tipo Mensualidad MontoAsegurado
Min. : 613 Min. :-0.50 Min. :302.7 Internacional: 6 Min. : 5029 Min. : 8470
1st Qu.: 2678695 1st Qu.:17.25 1st Qu.:431.3 Nacional :144 1st Qu.: 12122 1st Qu.: 462045
Median : 3987711 Median :17.25 Median :434.0 Median : 17533 Median : 639318
Mean : 4915943 Mean :16.40 Mean :439.9 Mean : 24432 Mean : 806379
3rd Qu.: 6780419 3rd Qu.:17.62 3rd Qu.:474.7 3rd Qu.: 29269 3rd Qu.:1091095
Max. :28647806 Max. :17.75 Max. :492.7 Max. :148886 Max. :4552955
TipoPago ModoPago Fiador Fiador2 OtroSeguro Record Edad EstadoCivil A:100 Cajas : 9 No:82 No:127 No:130 R0:130 Min. :31.00 Divorciado: 5
B: 36 NoDefinido:44 Si:68 Si: 23 Si: 20 R1: 9 1st Qu.:33.50 Soltero :140
C: 14 Tarjeta :97 R2: 11 Median :35.50 Union : 4
Mean :36.78 Viudo : 1
3rd Qu.:39.00
Max. :57.00
Salario Fraude
Min. : 0 Mode:logical
1st Qu.: 0 NA's:150
Median : 3806
Mean : 5198
3rd Qu.: 7432
Max. :82010

1
Maybe you should try this: stackoverflow.com/questions/15084803/…andresram1
Thanks @andresram1 I tried that but didn't work, I am trying to do a prediction with the data and using what is recommend will conver everything to char which it doesn't work to predict, I think the problem here is that the SegurosNuevosVE150.csv has the Column "Fraude" as logical with all rows on NA and the first has this column in Factors with 2 levels "Si" o "No", but I need to change the NA on SegurosNuevosVE150.csv to "Si" o "No" beause that is what I am predictingJorge Madrigal

1 Answers

0
votes

Your issue is simply in your last line. The argument you pass for 'newdata' only specifies 1 column of your dataframe when you need to specify the entirety of it.

modeloSig <- svm(Fraude ~ ., data = aprendizaje, kernel = "sigmoid")
predictFinal <- predict(modeloSig, newdata=datosPrec) #changed datosPrec