I'm using C50 library to try and predict the attendance for next years' graduation however my tree shows the ID as part of the prediction! when i take it out my tree becomes one node (level) only .. any suggestions will be highly appreciated
part of the Dataset (JSON):
{"id":"50","name":"James Charlie","faculty":"Science","degree":"Bachelor degree","course":"Sport Science","attend":"No","year":"2016"},
full dataset/Student object: git repo
R script:
con=dbConnect(MySQL(), user = 'root', password = '', dbname='students', host = 'localhost') dbListTables(con) Student <- dbReadTable(con, 'students') rows <- nrow(Student)
Student$attend <- as.factor(Student$attend) Student$year <- as.factor(Student$year)
Student$faculty <- as.factor(Student$faculty)
Student$course <- as.factor(Student$course)
Student
dim(Student)
summary(Student)
str(Student)
Student <- Student[-2]
dim(Student)
str(Student)
set.seed(1234)
Student_rand <- Student[order(runif(719)), ] #randomize the data
Student_train <- Student_rand[1:400, ] #split data/train data to predect the test
Student_test <- Student_rand[401:719, ] #validation for train prediction
summary(Student_train)
prop.table(table(Student_train$attend))#propability for prediction
prop.table(table(Student_test$attend))
Student_model <- C5.0(Student_train[,-5],Student_train$attend)
summary(Student_model)
Student_model
summary(Student_model)
jpeg("tree.jpg")
plot(Student_model)
dev.off()
Student_model$predictors
Student_model$trials
Student_model$tree
summary(Student_model)
Student_pred <- predict(Student_model, Student_test,type="class")
table(Student_test$attend ,Student_pred)
CrossTable(Student_pred, Student_test$attend,
prop.chisq = FALSE, prop.c = FALSE, prop.r = FALSE,
dnn = c('predicted default', 'actual default'))
**the first thing I tried was removing the id and i got the following error:
Error in partysplit(varid = as.integer(i), index = index, info = k, prob = NULL) *
: minimum of ‘index’ is not equal to 1 In addition: Warning message: In min(index, na.rm = TRUE) : no non-missing arguments to min; returning Inf
*
then i tried and added a random column which cause the prediction to use that random column as an inference..**
dput
on Student orhead(Student, 30)
if you have many observations). The 4 lines of json are insufficient to allow us to replicate your problem. – lmoStudent
. Instead of posting SQL/JSON code in your Github version of the data, why not use dput or dump to create an ASCII version of the actual R structure. Or you could post the R code that was used to read from that URL. – IRTFM