0
votes

I want to train a xgboost model,but when I create data partition, it has wrong:

library(data.table)
library(caret)
library(xgboost)
digit <- fread("train.csv",header=T)
index <- createDataPartition(digit$label,0.75,list=F)

Error in matrix(unlist(out), ncol = times) : data is too long

I use another method to create data partition, but here is another problem:

dim(digit)
[1] 42000   785
n <- 42000*0.7
index <- sample(1:42000,n)
train <- digit[index]
test <- digit[-index]
xgmat <- xgb.DMatrix(train[,-1],label=train[,1])

Error in xgb.DMatrix(train[, -1], label = train[, 1]) : xgb.DMatrix: does not support to construct from list In addition: Warning message: In if (class(data) == "dgCMatrix") { : the condition has length > 1 and only the first element will be used

OK, I first transform the data.frame to matrix:

train <- as.matrix(train)
xgmat <- xgb.DMatrix(train[,-1],label=train[,1])

Error in xgb.DMatrix(train[, -1], label = train[, 1]) : REAL() can only be applied to a 'numeric', not a 'integer'

What is wrong on earth? Can anyone help me?

1

1 Answers

0
votes

You have to use index <- createDataPartition(digit$label, p=0.75,list=F), otherwise 0.75 is treated as if it was a times=0.75 parameter. E.g., matrix(1:16, ncol=0.75) results in the same 'data is too long' error.

As for the "REAL() can only be applied to a 'numeric', not a 'integer'" error, it looks like your matrix is an integer matrix, while xgb.DMatrix expects it to be a numeric matrix. We'll add a fix to the next release so it accepts matrices of integers as well. Meanwhile, as a workaround, you may simply enforce the matrix input to be numeric:

train <- as.matrix(train) * 1
xgmat <- xgb.DMatrix(train[,-1],label=train[,1])