0
votes

I first fitted a Poisson glm in R as follows:

> Y<-c(13,21,12,11,16,9,7,5,8,8)
> X<-c(74,81,80,79,89,96,69,88,53,72)
> age<-c(50.45194,54.89382,46.52569,44.84934,53.25541,60.16029,50.33870,
+ 51.44643,38.20279,59.76469)
> dat=data.frame(Y=Y,off.set.term=log(X),age=age)
> fit.1=glm(Y~age+offset(off.set.term),data=dat,family=poisson)

Next I tried to get predictions of the response (on the log scale) for a new dataset using predict function. Note that I set the offset term as zero.

> newdat=data.frame(age=c(52.09374,50.89329,50.61472,39.13358,44.79453),off.set.term=rep(0,5))
> predict(fit.1,newdata =newdat,type="link")
        1         2         3         4         5 
-1.964381 -1.956234 -1.954343 -1.876416 -1.914839 

Next I tried package segmented (version 0.3-0.0) in R and fitted a segmented glm as follows. (The latest version of segmented package (i.e. 0.3-1.0) does not seem to support the offset term when using the predict function.)

> library(segmented)
> fit.2=segmented(fit.1,seg.Z=~age,psi=list(age=mean(age)),
+ offs=off.set.term,data=newdat)

I then used the predict function with fit.2 to get the predicted values:

> library(segmented)
> fit.2=segmented(fit.1,seg.Z=~age,psi=list(age=mean(age)),offs=off.set.term,data=newdat)
> 
> predict(fit.2,newdata =newdat,type="link")
        1         2         3         4         5 
-26.62968 -26.08611 -25.95997 -20.76125 -23.32456 

These predicted values are significantly different from the one I obtained using fit.1.

The problem seems to be in the offset term because when we fit the models without an offset term, then the results are reasonable and close to each other as follow:

> fit.3=glm(Y~age,data=dat,family=poisson)
> newdat.2=data.frame(age=c(52.09374,50.89329,50.61472,39.13358,44.79453))
> predict(fit.3,newdata =newdat.2,type="link")
       1        2        3        4        5 
2.406016 2.395531 2.393098 2.292816 2.342261 
> fit.4=segmented(fit.3,seg.Z=~age,psi=list(age=mean(age)),data=newdat.2)
> predict(fit.4,newdata =newdat.2,type="link")
       1        2        3        4        5 
2.577669 2.524503 2.512165 2.003679 2.254396 
1
You'd expect different predictions from different models. It's not clear why you'd treat numberOfDrugs as an offset, unless the healthValue necessarily changes with the number of drugs - and then setting the offset to zero implies all new patients have one drug each? It'll be the role of a person with domain knowledge to determine which is the best model. If they know there's a sudden change in response (healthvalue, or healthvalue per number of drugs, depending on offset excluded or not) due to age, then the segmented could be used - otherwise I'd prefer a quadratic term in there - Gavin Kelly
This is just a numerical example with a fake data and the name of the variable does not mean anything except, response, independent variable and the offset term. I didn't define the variables so it is really strange that you are making conclusions based on your interpretations! Anyhow I changed the names to make it more clear. We set the offset equal to zero to get the rates in Poisson glm. Here instead of rates I am interested in the link, so this is perfectly fine. And did you see the last codes I wrote? The predicted values should be close to each other. - Stat
I am not sure how your comment can help ... so maybe you want to remove it. - Stat

1 Answers

1
votes

Since I got the answer from the segmented package maintainer, I decided to share it here. First, up-date the package to version 0.3-1.0 by

install.packages("segmented",type="source")

After updating, running the same commands leads to:

> Y<-c(13,21,12,11,16,9,7,5,8,8)
> X<-c(74,81,80,79,89,96,69,88,53,72)
> age<-c(50.45194,54.89382,46.52569,44.84934,53.25541,60.16029,50.33870,
+ 51.44643,38.20279,59.76469)
> dat=data.frame(Y=Y,off.set.term=log(X),age=age)
> fit.1=glm(Y~age+offset(off.set.term),data=dat,family=poisson)
> 
> newdat=data.frame(age=c(52.09374,50.89329,50.61472,39.13358,44.79453),off.set.term=rep(0,5))
> predict(fit.1,newdata =newdat,type="link")
        1         2         3         4         5 
-1.964381 -1.956234 -1.954343 -1.876416 -1.914839 
> 
> library(segmented)
> fit.2=segmented(fit.1,seg.Z=~age,psi=list(age=mean(age)),offs=off.set.term,data=newdat)
> predict(fit.2,newdata =newdat,type="link")
Error in offset(off.set.term) : object 'off.set.term' not found

So the offset term cannot be found. Now the trick (for now) is to first attach newdat and then predict as follows:

> attach(newdat)
The following object is masked _by_ .GlobalEnv:

    age
> predict(fit.2,newdata =newdat,type="link")
        1         2         3         4         5 
-1.825831 -1.853842 -1.860342 -2.128237 -1.996147 

The results do make sense now. Cheers!