2
votes

I am trying to use an H2O library in both Python and R to produce a GLM without an intercept included. Unfortunately, it does not appear to be working. The results are completely off, the intercept coefficient is non-zero (only standardized coefficient for intercept is zero), however, this does not give me a correct prediction.

With intercept excluded from the model, I expect the prediction for a case when all other inputs equal to 0, to be 0 as well. This is not the case. The coefficient is offsetting the prediction quite significantly and actually, if I set intercept=True with simulated data that I know should have no intercept, my intercept coefficient is much closer to 0 than when I run the same data with intercept=False.

The same occurs in both R and Python, and I am unsure if I am doing something incorrectly in setting up the model.

The example of the code I have written just to test the problem in R:

library(h2o)
h2o.init()

x1 = runif(500)
x2 = runif(500)
x3 = runif(500)
y = 2.67*x1 + 1.23*x2 -7.2*x3
h2odata<-data.frame(x1,x2,x3,y)
head(h2odata)

h2odata <- as.h2o(h2odata)

predictors <- c('x1','x2','x3')
response <- 'y'

h2o.splits = h2o.splitFrame(data=h2odata,ratios=.8)
train <- h2o.splits[[1]]
valid <- h2o.splits[[2]]

glm <- h2o.glm(x=predictors,y=response,family='gaussian',link='identity',
               intercept = FALSE,training_frame = train,
               validation_frame = valid)
glm

x1=0
x2=0
x3=0
newdata = data.frame(x1,x2,x3)
colnames(newdata)<-c('x1','x2','x3')

newdata<-as.h2o(newdata)
h2o::h2o.predict(glm,newdata)

Am I missing something obvious here?

1

1 Answers

3
votes

Seeing the way you generate the data, you should use standardize = F in the h2o.glm to avoid your problem.

http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/glm.html

It's a issue with the coefff and the standarized coeffs. Note that the best results you should have with intercept = T and standardize = T.

You should avoid intercept when you must predict a 0 values and only in a few more cases.