1
votes

I cannot see how XGBoost's predict method makes predictions using more than one feature.

library(xgboost)
library(MASS)

sp500=data.frame(SP500)
label=sp500[,1]
lag1=sp500[-1,]
lag2=lag1[-1]
lag3=lag2[-1]
train=cbind(lag1,lag2,lag3)

model=xgboost(data=train[50:1000,],label=label[50:1000],
objective="reg:linear",booster="gbtree",nround=50)

predict(model,train[1,]) #returns an error, because it will not accept multiple columns



predict(model,t(train[1,]))

transposing my test set does not return an error, however this is incorrectly using the predictors because

predict(model,t(train[1:5,]))

only predicts three values instead of the expected five

So my question is, how can I make predictions with XGBoost using the same features as were used to build the model? In this example I built a model with three features, lag1, lag2, and lag3, to predict the response, return. But when attempting to make predictions using predict, the function behaves as if it only will use one feature, and if it uses multiple values like when I transposed the test set, it is unknown how it is making use of these values.

1

1 Answers

3
votes

you are really close... stay with me here...

> dim(train)
[1] 2779    3

ok you trained with three features.. no surprises there

when you do this

> predict(model,train[1,])
Error in xgb.DMatrix(newdata) : 
  xgb.DMatrix: does not support to construct from  double

xboost is looking for a matrix and you gave it a vector, moving on...

##this works 

> predict(model,t(train[1,]))
[1] -0.09167647
> dim(t(train[1,]))
[1] 1 3

because you transposed a vector, which made a 1 * 3 matrix

but THIS is messed up

> predict(model, t(train[1:5,]))
[1] -0.09167647  0.31090808 -0.10482860
> dim(t(train[1:5,]))
[1] 3 5
### Xgboost used the 3 rows and the first three columns only to predict
## the transpose didn't do the same thing here

the error is because transposing a (column) vector and transposing a matrix are different things

what you really want is this

> predict(model,train[1:5,]) 
[1] -0.09167647  0.31090808 -0.10482860 -0.02773660  0.33554882
> dim(train[1:5,]) ## five rows of three columns
[1] 5 3

ALSO

you gotta really be careful because if you don't give it enough columns xgboost will recycle the columns like this...

 predict(model,train[1:5,1:2])
[1] -0.07803667 -0.25330877  0.10844088 -0.04510367 -0.27979547
 ## only gave it two columns and it made a prediction :)

Just make sure you give it a matrix with the same number of columns or all hell will break loose :)