Dear stackoverflow community,
I read a bunch of answers on how to parallelize raster calculation. However, I did not find an answer fitting my questions. My question is about the parallelization to replace values in a raster with a data frame. It seems easy and I found a way to do this. However, I have a raster of more than 5 millions cells and a data frame of more than 340 000 rows. Thus, my for loop take a very loooong time. Thus, I was wondering about foreach loop to do my calculation. My main problem is to combine my results at the end.
I provide an easy example (from:http://j.p.rossi.free.fr/rpackages/ecpaysage/TD_Analyse_quanti_paysage.html#(29)) in order for you to understand what I want to do:
library(ECPaysage)
library(raster)
r <- raster(system.file("extdata/r.tif", package="ECPaysage"))
plot(r)
This is the landuse raster on which we want some patch information. For this, we use the SDMTools package:
library(SDMTools)
res(r)
matbi <- r
matbi[] <- 0 # let's create the same raster with 0 values
w <- which(r[]==1)
matbi[w] <- 1 # this creates a binary raster for the specific landuse == 1
plot(matbi, axes=F, box=F, main="landuse 1 : build")
From this binary raster we want to create the patch information. Thus, we create a raster of patches:
matpatch <- ConnCompLabel(matbi)
plot(matpatch, axes=F,box=F, main="patch ID landuse 1 :build")
Then, we extract information at the patch level from this patch raster:
patch <- PatchStat(mat=matpatch, cellsize = res(r)[1], latlon = FALSE)
names(patch)
dim(patch)
patch$patchID
dim(patch)
Now, we have the number of patches contained in the raster (each cell possesses a patch ID). And we want to extract at the raster, the patch value for each cell of the specified patch ID.
shapeindex <- matpatch
w <- which(shapeindex[]==0) # cells that do not belong to any patch
shapeindex[w] <- NA
Thus, to extract for each patch ID cell the corresponding patch index, a for loop is done:
system.time(for(p in 2:(dim(patch)[1])) {
w <- which(shapeindex[]==patch$patchID[p])
shapeindex[w] <- patch$shape.index[p]
})
plot(shape, axes=F,main="shape index - build", box=F, col=rev(terrain.colors(dim(patch)[1]-1)))
This works perfectly but with my data (raster of more than 5 millions cells and patch data frame of more than 340 000 rows) it takes too much time (and this is not good for reproducibility). One of my idea was to parallelized this loop with a foreach. However, my main problem is to combine my rasters at the end. If I parallelized on the rows of the data frame, I will have 340 000 matrix to combine. And it might take the same time as the for loop. Thus, I was wondering if there was a way (or multiple ways like a function) to speed the process and/or combine the rasters in an easy way. Thanks for all the help you can provide. Best, Adrienne
lapply
to iterate that function, then combining with something likedplyr::bind_rows
at the end? Looks like that should work here, and usually much faster than afor
loop. – ulfelder