0
votes

So the data is retrieved here https://archive.ics.uci.edu/ml/machine-learning-databases/00497/divorce.rar

When I run the code to perform logistic regression, it shows an error. But it runs perfectly on others R program. Is there anything that I have missed out?

set.seed(123)

divorce = read.csv("C://Users//User//Documents//Y2S3//Predictive Modelling//divorce//divorce.csv")

dim(divorce)

Outcome: [1] 170 1

summary(divorce)

Outcome: Atr1.Atr2.Atr3.Atr4.Atr5.Atr6.Atr7.Atr8.Atr9.Atr10.Atr11.Atr12.Atr13.Atr14.Atr15.Atr16.Atr17.Atr18.Atr19.Atr20.Atr21.Atr22.Atr23.Atr24.Atr25.Atr26.Atr27.Atr28.Atr29.Atr30.Atr31.Atr32.Atr33.Atr34.Atr35.Atr36.Atr37.Atr38.Atr39.Atr40.Atr41.Atr42.Atr43.Atr44.Atr45.Atr46.Atr47.Atr48.Atr49.Atr50.Atr51.Atr52.Atr53.Atr54.Class

Length:170

Class :character

Mode :character

colnames(divorce)

Outcome: [1] "Atr1.Atr2.Atr3.Atr4.Atr5.Atr6.Atr7.Atr8.Atr9.Atr10.Atr11.Atr12.Atr13.Atr14.Atr15.Atr16.Atr17.Atr18.Atr19.Atr20.Atr21.Atr22.Atr23.Atr24.Atr25.Atr26.Atr27.Atr28.Atr29.Atr30.Atr31.Atr32.Atr33.Atr34.Atr35.Atr36.Atr37.Atr38.Atr39.Atr40.Atr41.Atr42.Atr43.Atr44.Atr45.Atr46.Atr47.Atr48.Atr49.Atr50.Atr51.Atr52.Atr53.Atr54.Class"

sapply(divorce,class)

Outcome: Atr1.Atr2.Atr3.Atr4.Atr5.Atr6.Atr7.Atr8.Atr9.Atr10.Atr11.Atr12.Atr13.Atr14.Atr15.Atr16.Atr17.Atr18.Atr19.Atr20.Atr21.Atr22.Atr23.Atr24.Atr25.Atr26.Atr27.Atr28.Atr29.Atr30.Atr31.Atr32.Atr33.Atr34.Atr35.Atr36.Atr37.Atr38.Atr39.Atr40.Atr41.Atr42.Atr43.Atr44.Atr45.Atr46.Atr47.Atr48.Atr49.Atr50.Atr51.Atr52.Atr53.Atr54.Class "character"

col_fac = c("Atr1","Atr2","Atr3","Atr4","Atr5","Atr6","Atr7","Atr8","Atr9","Atr10", +"Atr11","Atr12","Atr13","Atr14","Atr15","Atr16","Atr17","Atr18","Atr19","Atr20", +"Atr21","Atr22","Atr23","Atr24","Atr25","Atr26","Atr27","Atr28","Atr29","Atr30", +"Atr31","Atr32","Atr33","Atr34","Atr35","Atr36","Atr37","Atr38","Atr39","Atr40", +"Atr41","Atr42","Atr43","Atr44","Atr45","Atr46","Atr47","Atr48","Atr49","Atr50", +"Atr51","Atr52","Atr53","Atr54","Class")

divorce[col_fac] = lapply(divorce[col_fac],factor)

Outcome: Error in [.data.frame(divorce, col_fac) : undefined columns selected)

1
Welcome to SO! You maximise your chance of getting a useful answer if you provide a minimum reproducible example. This post may help. Specifically, please do not provide either your code or data as screenshots. Please use dput() for data and paste your code. - Limey

1 Answers

0
votes

The only issue is that you read a file that is separated by ";" and not ",". The sep = ";" will solve the issue.

# downloaded and extracted from https://archive.ics.uci.edu/ml/machine-learning-databases/00497/divorce.rar
divorce <- read.csv("./divorce.csv", sep = ";")
dim(divorce)
summary(divorce)
colnames(divorce)
sapply(divorce,class)
col_fac = c("Atr1","Atr2","Atr3","Atr4","Atr5","Atr6","Atr7","Atr8","Atr9","Atr10",
            "Atr11","Atr12","Atr13","Atr14","Atr15","Atr16","Atr17","Atr18","Atr19","Atr20",
            "Atr21","Atr22","Atr23","Atr24","Atr25","Atr26","Atr27","Atr28","Atr29","Atr30", 
            "Atr31","Atr32","Atr33","Atr34","Atr35","Atr36","Atr37","Atr38","Atr39","Atr40", 
            "Atr41","Atr42","Atr43","Atr44","Atr45","Atr46","Atr47","Atr48","Atr49","Atr50", 
            "Atr51","Atr52","Atr53","Atr54","Class")

divorce[col_fac] = lapply(divorce[col_fac],factor)

less error prone version using dplyr

The following will mutate your dataset by applying the function as.factor across those variables where the function is.numeric returns TRUE. Note that the functions passed within across and where do not get the usual parenthesis.

library(dplyr)
divorce <- read.csv("./divorce.csv", sep = ";") %>% 
  mutate(across(where(is.numeric), as.factor))
glimpse(divorce)

For detailed info on mutate across, type ?across in R Console.