2
votes

I have a set of points and I want to extract values from several large rasters for a buffer around those points. The rasters are too large to be held in memory (> 1e10 cells). I illustrate my current approach below but would be interested if there is a faster approach.

library(maps)
library(sf)
library(raster)
library(dplyr)
library(parallel)

# sf object with polygones for which we want values
crs <- "+proj=utm +zone=33 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
map <- sf::st_as_sf( maps::map(regions = "Sweden", plot = FALSE, fill = TRUE))
map <- st_transform(map, crs = crs)
sf_points <- st_sfc(st_sample(map, 100))

sf_points <- 
  data.frame(A = 1:length(sf_points)) %>% 
  st_set_geometry(sf_points) 

# raster too large to fit in memory
# the raster(s) I am working on has 10m resolution

r <- raster(extent(map), nrow = 15000, ncol = 7000,
       crs = crs)

values(r) <- rep(sample(1:10, 77, replace = T),  length.out = ncell(r))

#use the parallel package for parallel processing
cluster <- makeCluster(4)
clusterExport(cluster, c("r","sf_points", "as_Spatial"))

List_points <- 
  sf_points %>% 
  mutate(split = rep(1:ceiling(n()/4), each=4, length.out=n())) %>% # 4 cores
  split(f = .$split) %>% 
  parLapply(cl = cluster, X = ., function(x) raster::extract(r, y = as_Spatial(x), buffer = 5000)) %>% 
  unlist(recursive = F)

I repeat the extraction for each raster. As the values are ordered I can then summarize the pixel values across lists. I cannot (easily) create a raster stack as the raster have different extent.

using the velox package has sugested here doesn't seem to work as it tries to load the raster into memory which fails. I could try to load it chunk-wise but then I would need to figure out which points are on which chunk...

1

1 Answers

3
votes

Maybe you can speed this up a bit using polygons without aggregating (dissolving) the buffers

library(raster)
swe <- getData("GADM", country="SWE", level=0)
set.seed(0)
pts <- spsample(swe, 100, "regular")
r <- raster(swe, nrow = 15000, ncol=7000)
values(r) <- rep(sample(1:10, 77, replace = T),  length.out = ncell(r))

b1 <- buffer(pts, 5000, dissolve=FALSE)
b2 <- buffer(pts, 5000, dissolve=TRUE)

system.time(e1 <- extract(r, pts, buffer=5000))
#   user  system elapsed 
#   1.39    0.02    1.40 
system.time(e2 <- extract(r, b1))
#   user  system elapsed 
#   0.88    0.00    0.88 
system.time(e3 <- extract(r, b2))
#   user  system elapsed 
#  26.34   25.02   51.52 

Clearly b1 performs much better than b2; but not that much faster than the first approach.

You say you cannot make a RasterStack because the rasters have different extents. If (and only if!), however, they have the same origin and resolution, you could first translate all areas to xy coordinates and then use these.

Something like this:

z <- rasterize(b, r)
pts <- rasterToPoints(z, xy=TRUE)

The above takes time, but after that

system.time(a <- extract(r, zz[,1:2]))
   user  system elapsed 
   0.04    0.00    0.04 

It could be faster to do this is parallel for each point, and use crop(raster(r), polygon) before rasterize.