1
votes

Possibly this is a naive question but did not find a solution. I have a dataframe with count data from field survey and I want to predict species richness using poisson regression. The survey is allocated to grids of equal size but variable number of survey were done in each grid. So I wanted to include 'number of surveys per grid' as offset. The problem is when I want to predict the glm output using raster stack it wants a raster layer for the offset variable (number of surveys per grid). My question is how to incorporate that offset variable into raster stack so that I can produce a spatial prediction (i.e., prediction should be a raster file). Below is my reproducible effort (using fewer variable):

Create the dataframe:

bio2 <- c(12.74220, 14.10092, 13.82644, 14.30550, 15.02780, 14.88224, 13.98853, 14.89524, 15.59887, 13.98664, 14.75405,
15.38178, 14.50719, 15.00427, 12.77741, 13.25432, 12.91208, 15.75312, 15.36683, 13.33202, 12.55190, 14.94755,
13.52424, 14.75273, 14.42298, 15.37897, 12.02472, 15.49786, 14.28823, 13.01982, 13.60521, 15.07687, 14.17427,
13.24491, 14.84833, 13.52594, 13.92113, 11.39738, 14.31446, 12.10239)

bio9 <- c(26.30980, 26.52826, 27.03376, 23.93621, 26.48416, 26.05859, 25.37550, 25.34595, 25.34056, 23.37793, 25.74681,
22.72016, 22.00458, 24.37140, 22.95169, 24.52542, 24.63087, 22.86291, 23.10240, 23.79215, 24.86875, 21.40718,
23.84258, 21.91964, 25.97682, 24.97625, 22.31471, 19.64094, 23.93386, 25.87234, 25.99514, 17.17149, 20.72802,
18.22862, 24.51112, 24.33626, 23.90822, 23.43660, 23.07425, 20.71244)

count <- c(37, 144,  91,  69,  36,  32,  14,  34,  48, 168,  15,  21,  36,  29,  24,  16,  14,  11,  18,  64,  37,  31,  18,   9,   4,
16,  14,  10,  14,  43,  18,  88,  69,  26,  20,   5,   9,  75,   8,  26)

sitesPerGrid <- c(3, 16, 8,  5,  3,  3,  1,  3,  3, 29,  2,  4,  5,  2,  3,  4,  2,  1,  2,  9,  6,  3,  3,  2,  1,  2,  2,  1,  2,  5,  7, 15,  9,  4,
1,  1,  2, 22,  6,  5)

testdf <- data.frame(bio2, bio9, count, sitesPerGrid)

pois1 <- glm(count ~ bio2 + bio9, offset = log(sitesPerGrid), family = poisson (link = "log"), data = testdf)

Spatial prediction:

library(raster)
bio_2 <- bio_9 <- raster(nrow=5,ncol=8, xmn=0, xmx=1,ymn=0,ymx=1)
values(bio_2) <- bio2
values(bio_9) <- bio9
predRas <- stack(bio_2, bio_9)
names(predRas) <- c("bio2", "bio9")

pdPois <- raster::predict(predRas, pois1, type = "response")
#Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = #object$xlevels) : 
#  variable lengths differ (found for 'bio9')
#In addition: Warning message:
#'newdata' had 16 rows but variables found have 40 rows 


 

I get error because it expect a raster layer for sitesPerGrid. But I don't want to use sitesPerGrid as a predictor.

Update

Based on the comment and answer given by @robertHijmans I have tried using the following code:

pdPois <- raster::predict(predRas, pois1, const = testdf[, "sitesPerGrid"], type = "response")

Again I get the following error:

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 143811, 40

2
please include a minimal, self-contained, reproducible example. It is very difficult to help without that. using the constant= argument may do the trick. See e.g. stackoverflow.com/q/65329734/635245Robert Hijmans
@RobertHijmans Thank you very much. I have now reduced the main reproducible effort and separated one part at the bottom (to show how I created a raster for the offset layer).Tiny_hopper
My expectation is that I can copy and paste your code into R to see the error for myself.Robert Hijmans
@RobertHijmans Thank you very much. I have now updated so that the problem is self-contained and reproducible. I have tried your answer here (stackoverflow.com/a/65330365/6254041) still get the error. Kind regards.Tiny_hopper

2 Answers

0
votes

I see that this works, because the number of data points is the same as what was used to fit the model

p <- predict(pois1, as.data.frame(predRas), type = "response")

However, this (taking two data points) does not work:

p <- predict(pois1, as.data.frame(predRas)[1:2,], type = "response")

#Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
#  variable lengths differ (found for 'bio9')
#In addition: Warning message:
#'newdata' had 2 rows but variables found have 40 rows 
 

So, irrespective of the raster data, can you (and if so how?) use a model like this to make predictions to (any number of) new data points?

0
votes

The problem is solved using a raster for the offset variable. The raster is created based on a hypothesis. For example, I want to see the prediction if there is one site per grid, or mean(sitesPerGrid) or max(sitesPerGrid). If my hypothesis is mean(sitesPerGrid) then the raster for prediction would be:

# make new raster for sitesPerGrid
rasGrid <- bio2
rasGrid[,] <- mean(testdf$sitesPerGrid)
names(rasGrid) <- "sitesPerGrid"

predRas <- stack(bio_2, bio_9, rasGrid)

p <- raster::predict(predRas, pois1, type = "response")