1
votes

I'm using the R package C50 to train a C5.0 decision tree with a relatively large data set, which contains around 7 million observations and 25 variables (int, num, factor, ordered factor):

C5Tree <- C5.0(Fraud ~ ., data = training, costs = costs)

Training works fine, but when I try to plot the tree I get the following error message:

plot(C5Tree)
Error in partysplit(varid = as.integer(i), index = index, info = k, prob = NULL) : 
  minimum of ‘index’ is not equal to 1

When I use a subsample of the data (around 3.5 million observations) I get a different error message:

Error in 1:dim(a17)[1] : argument of length 0

I didn't experience any problems like that while using rpart and RWeka.

1

1 Answers

0
votes

I recently had this problem too. It was coming from a categorical variable with a very small number of observations in a given category.

I suggest looking at the counts of different categories of your explanatory variables - my guess is there is a category with only 1 or 2 observations in it.