1
votes

I am trying to develop a function to "synchronise" NAs among layers of a raster stack, i.e. to make sure that for any given pixel of the stack, if one layer has a NA, then all layers should be set to NA for that pixel.

This is particularly useful when combining rasters coming from varying sources for species distribution modelling, because some models do not handle properly NAs.

I have found two ways to do this, but I find neither of them satisfactory. One of them requires to use the function getValues and thus is not usable for very large stacks or computers with low RAM. The other one is more memory-safe but is much slower. I am therefore here to ask if anyone has an idea to improve my attempts?

Here are the two possibilities:

  1. Using getValues()

    syncNA1 <- function (x) 
    {
      val <- getValues(x)
      NA.pos <- unique(which(is.na(val), arr.ind = T)[, 1])
      val[NA.pos, ] <- NA
      x <- setValues(x, val)
      return(x)
    }
    
  2. Using calc()

    syncNA2 <- function(y)
    {
      calc(y, na.rm = T, fun = function(x, na.rm = na.rm)
      {
        if(any(is.na(x)))
        {
          rep(NA, length(x))
        } else
        {
          x
        }
      })
    }
    

Now a demonstration of their respective computing times for the same stack:

> system.time(
+ b1 <- syncNA1(a1)
+ )
   user  system elapsed 
   3.04    0.15    3.20 
> system.time(
+ b2 <- syncNA2(a1)
+ )
   user  system elapsed 
   5.89    0.19    6.08 

Many thanks for your help,

Boris

3

3 Answers

3
votes

With a stack named "s", I would first use calc(s, fun = sum) to compute a mask layer that records the location of all cells with an NA value in at least one of the stack's layers. mask() will then allow you to apply that mask to every layer in the stack.

Here's an example:

library(raster)

## Construct reproducible data! (Here a raster stack with NA values in each layer) 
m <- raster(ncol=10, nrow=10)
n <- raster(ncol=10, nrow=10)
m[] <- runif(ncell(m))
n[] <- runif(ncell(n)) * 10
m[m < 0.5] <- NA
n[n < 5] <- NA
s <- stack(m,n)

## Synchronize the NA values
s2 <- mask(s, calc(s,fun = sum))

## Check that it worked
plot(s2)

enter image description here

3
votes

I don't know about speed, but you might try converting to an array, loading up the NA, and converting back. pseudocode:

xarray<-as.array(xstack)
ind.na<-which(is.na(xarray),array.ind=TRUE)
for(j in nrow(ind.na) ) {
    xarray[ind.na[j,1],ind.na[j,2],]<-NA
   }
nastack<-raster(xarray)

I haven't verified the correct choice of indices there, nor have I verified I converted back to raster stack correctly, but I hope you get the idea.

EDIT: I ran a time test, with rasters 1000x1000 but otherwise as Josh created.

 microbenchmark(josh(s),syncNA1(s),syncNA2(s),times=5)
Unit: milliseconds
       expr       min        lq    median        uq        max
    josh(s)  774.2363  789.1653  800.2511  806.5364   809.9087
 syncNA1(s)  652.3928  659.8327  692.3578  695.8057   743.9123
 syncNA2(s) 7951.3918 8291.7917 8604.2226 8606.3432 10254.4739
 neval
     5
     5
     5
1
votes

I ended up building an hybrid function between syncNA1 and Josh's solution. This function is memory-safe if the computer does not have enough RAM, but can process faster if the computer has enough RAM:

synchroniseNA <- function(x)
{
  if(canProcessInMemory(x, n = 2))
  {
    val <- getValues(x)
    NA.pos <- unique(which(is.na(val), arr.ind = T)[, 1])
    val[NA.pos, ] <- NA
    x <- setValues(x, val)
    return(x)
  } else
  {
    x <- mask(x, calc(x, fun = sum))
    return(x)
  }
}

However, I empirically determined that the amount of ram used by a data.frame is twice the size of a raster file (for the n argument of canProcessInMemory()), but I am not exactly sure I am right here.