Parallelization for changing raster values

Question

Dear stackoverflow community,

I read a bunch of answers on how to parallelize raster calculation. However, I did not find an answer fitting my questions. My question is about the parallelization to replace values in a raster with a data frame. It seems easy and I found a way to do this. However, I have a raster of more than 5 millions cells and a data frame of more than 340 000 rows. Thus, my for loop take a very loooong time. Thus, I was wondering about foreach loop to do my calculation. My main problem is to combine my results at the end.

I provide an easy example (from:http://j.p.rossi.free.fr/rpackages/ecpaysage/TD_Analyse_quanti_paysage.html#(29)) in order for you to understand what I want to do:

library(ECPaysage)
library(raster)
r <- raster(system.file("extdata/r.tif", package="ECPaysage")) 
plot(r)

This is the landuse raster on which we want some patch information. For this, we use the SDMTools package:

library(SDMTools)
res(r)

matbi <- r
matbi[] <- 0 # let's create the same raster with 0 values
w <- which(r[]==1) 
matbi[w] <- 1 # this creates a binary raster for the specific landuse == 1 

plot(matbi, axes=F, box=F, main="landuse 1 : build")

From this binary raster we want to create the patch information. Thus, we create a raster of patches:

matpatch <- ConnCompLabel(matbi)
plot(matpatch, axes=F,box=F, main="patch ID landuse 1 :build")

Then, we extract information at the patch level from this patch raster:

patch <- PatchStat(mat=matpatch, cellsize = res(r)[1], latlon = FALSE)
names(patch)

dim(patch)

patch$patchID
dim(patch)

Now, we have the number of patches contained in the raster (each cell possesses a patch ID). And we want to extract at the raster, the patch value for each cell of the specified patch ID.

shapeindex <- matpatch

w <- which(shapeindex[]==0) # cells that do not belong to any patch
shapeindex[w] <- NA

Thus, to extract for each patch ID cell the corresponding patch index, a for loop is done:

system.time(for(p in 2:(dim(patch)[1])) {
  w <- which(shapeindex[]==patch$patchID[p])
  shapeindex[w] <- patch$shape.index[p]
})
plot(shape, axes=F,main="shape index - build", box=F, col=rev(terrain.colors(dim(patch)[1]-1)))

This works perfectly but with my data (raster of more than 5 millions cells and patch data frame of more than 340 000 rows) it takes too much time (and this is not good for reproducibility). One of my idea was to parallelized this loop with a foreach. However, my main problem is to combine my rasters at the end. If I parallelized on the rows of the data frame, I will have 340 000 matrix to combine. And it might take the same time as the for loop. Thus, I was wondering if there was a way (or multiple ways like a function) to speed the process and/or combine the rasters in an easy way. Thanks for all the help you can provide. Best, Adrienne

Have you tried making a function that does one iteration of your process, then using lapply to iterate that function, then combining with something like dplyr::bind_rows at the end? Looks like that should work here, and usually much faster than a for loop. — ulfelder

Adrienne Gastineau Adrienne Gastineau · Accepted Answer · 2019-03-07T16:40:26

Thanks for your reply @ulfelder I tried something as you said. I created a function, use lapply and combine the rasters between them:

If I take the previous code before the loop, it gives:

shape <- matpatch
w <- which(shape[]==0)
shape[w] <- NA
class(shape)
shape
plot(shape)

# create my function for one iteration of the for loop
RastVal <- function(monr,vec,i){
  require(raster)
  w <- which(monr[]==vec$patchID[i])
  x <- which(monr[]!=vec$patchID[i])
  monr[w] <- vec$shape.index[i]
  monr[x] <- 0
  return(monr)
}

# create a list of rasters containing the patch information
RastL <- lapply(X=c(2:(dim(patch)[1])),
                   FUN=function(x)RastVal(shape,patch,x))

# Combining the rasters
new.rast <- RastL[[1]]
for(i in 2:length(RastL)){
  x <- RastL[[i]][] != 0
  new.rast[x] <- RastL[[i]][x]
}
plot(new.rast) # should be the same as the raster shape in the previous code

However, I am not sure that is more efficient... Maybe, if the raster gets bigger the time difference between the two ways tends to decrease...

system.time(for(p in 2:(dim(patch)[1])) {
  w <- which(shape[]==patch$patchID[p])
  shape[w] <- patch$shape.index[p]
})

system.time(RastL <- lapply(X=c(2:(dim(patch)[1])),
                   FUN=function(x)RastVal(shape,patch,x)))

new.rast <- RastL[[1]]
system.time(for(i in 2:length(RastL)){
  x <- RastL[[i]][] != 0
  new.rast[x] <- RastL[[i]][x]
})

Moreover, creating a list of roughly 340 000 rasters might lead to memory issues... Thanks again!

Best, Adrienne

Parallelization for changing raster values

1 Answers