2
votes

I'm trying to run a gam in R and I'm getting a strange error message.

Generally, I have some number of counts, per volume of water sampled, and I want to correct by that number of counts. I'm trying to generate a smooth function that fits the counts as a function of depth, accounting for differences in volume sampled.

test <- structure(list(depth = c(2.5, 7.5, 12.5, 17.5, 22.5, 27.5, 32.5, 
37.5, 42.5, 47.5, 52.5, 57.5, 62.5, 67.5, 72.5, 77.5, 82.5, 87.5, 
92.5, 97.5), count = c(53323, 665, 1090, 491, 540, 514, 612, 
775, 601, 497, 295, 348, 357, 294, 292, 968, 455, 148, 155, 101
), vol = c(2119.92, 111.76, 156.64, 71.28, 77.44, 73.92, 62.48, 
78.32, 74.8, 81.84, 53.68, 80.96, 80.08, 79.2, 79.2, 77.44, 77.44, 
84.48, 73.04, 59.84)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -20L), .Names = c("depth", "count", "vol"
))

gam(count ~ s(depth) + offset(vol), data = test, family = "poisson")
Error in if (pdev - old.pdev > div.thresh) { : missing value where TRUE/FALSE needed

Any idea why this is not working? If I get rid of the offset, or if I set family = "gaussian" the function runs as one would expect.

Edit: I find that

gam(count ~ s(depth) + offset(log(vol)), data = test, family = "poisson")

does run, and I think I saw something that said that one wants to log transform the offset variable for these, so maybe this is actually working ok.

1

1 Answers

3
votes

You definitely need to put vol on the log scale (for this model).

More generally, an offset enters the model on the scale of the link function. Hence if your model use family = poisson(link = 'sqrt'), then you'd want to include the offset as offset(sqrt(vol)).

I suspect the error is coming from some overflow or bad value in the likelihood/deviance arising from assuming that the vol values were on the log scale whilst the initial model was fitting.