7
votes

I have a stack of rasters (one per species) and then I have a data frame with lat/long columns along with a species name.

fls = list.files(pattern="median")
s <- stack(fls)
df<-c("x","y","species name")

I want to be able to just select one raster at a time to use with an extract function. I want the selection to be based on the partial match based on the species name column. I want to do this because the raster names might not match perfectly the names in the species list, there might be a lower/upper case mismatch or the raster layer name might be longer, for example "species_name_median", or there might also be "_" instead of a blank.

for(i:length(df.species name))
{
  result<-extract(s[[partial match to "species name[i]" ]],df.xy)
}

I hope this makes sense that I just want to use one raster at a time for the extraction. I am able to easily select a single raster using s[[i]] but there is no guarantee that every species in the list has its equivalent raster.

2
This question is basically imposible to answer in any meaningful way without some examples of the sorts of fuzzy matching you want to do.Simon O'Hanlon
@SimonO101 One example would be: a raster named "Lion_median", and the species name column there would be "lion". In this case I need to match lion with Lion. Does that helpHerman Toothrot
Yes it does. I have added an answer that will work, provided species names are actually spelt correctly (i.e. the match ignores punctuation and case and position of the species name within the layername). HTH.Simon O'Hanlon
If you require more help please do post any subsequent problems you ran in to... :-)Simon O'Hanlon
@SimonO101 I am not familiar with some of the functions you used, so it will take me some time to understand what your answer actually does. But thanks.Herman Toothrot

2 Answers

4
votes

If your data of points to query on consists of a data.frame of x and y coordinates and the appropriate species name for the layer to query on you can use these two commands to do everything:

#  Find the layer to match on using 'grepl' and 'which' converting all names to lowercase for consistency
df$layer <- lapply( df$species , function(x) which( grepl( tolower(x) , tolower(names(s)) ) ) )


# Extract each value from the appropriate layer in the stack
df$Value <- sapply( seq_len(nrow(df)) , function(x) extract( s[[ df$layer[x] ]] , df[ x , 1:2 ] ) )

How it works

Starting from the first line:

  • First we define a new column vector df$layer which will be the index of the rasterLayer in the stack that we need to use for that row.
  • lapply iterates along all the elements in the column df$species and applies an anonymous function using each item in df$species as an input variable x in turn. lapply is a loop construct even though it doesn't look like one.
  • on the first iteration we take the first element of df$species which is now x and use it in grepl (means something like 'global regular pattern matching logical') to find which elements of the names of our stack s contain our species pattern. We use tolower() on both the pattern to match against (x) and the elements to match in (names(s)) to ensure we match even when the case doesn't match case, e.g. "Tiger" won't find "tiger".
  • grepl returns a logical vector of which elements it found matches of the pattern in, e.g. grepl( "abc" , c("xyz", "wxy" , "acb" , "zxabcty" ) ) returns F , F , T , T. We use which to get the index of those elements.
  • The idea is that we get one, and only one match of a layer in the stack to the species name for each row, so the only TRUE index will be the index of the layer in the stack we want.

On the second line, sapply:

  • sapply is an iterator much like lapply but it returns a vector rather than a list of values. TBH you could use either in this use-case.
  • Now we iterate across a sequence of numbers from 1 to nrow(df).
  • We use the row number in another anonymous function as our input variable x
  • We want to extract the "x" and "y" coordinates (columns 1 and 2 respectively) for the current row (given by x) of the data.frame, using the layer that we got in our previous line.
  • We assign the result of doing all this to another column in our data.frame which contains the extracted value for that x/y coord for the appropriate layer

I hope that helps!!

And a worked example with some data:

require( raster )
#  Sample rasters - note the scale of values in each layer  
# Tens
r1 <- raster( matrix( sample(1:10,100,repl=TRUE) , ncol = 10 ) )    
# Hundreds
r2 <- raster( matrix( sample(1e2:1.1e2,100,repl=TRUE) , ncol = 10 ) )   
# Thousands
r3 <- raster( matrix( sample(1e3:1.1e3,100,repl=TRUE) , ncol = 10 ) )

#  Stack the rasters
s <- stack( r1,r2,r3 )
#  Name the layers in the stack
names(s) <- c("LIon_medIan" , "PANTHeR_MEAN_AVG" , "tiger.Mean.JULY_2012")


#  Data of points to query on
df <- data.frame( x = runif(10) , y = runif(10) , species = sample( c("lion" , "panther" , "Tiger" ) , 10 , repl = TRUE ) )

#  Run the previous code
df$layer <- lapply( df$species , function(x) which( grepl( tolower(x) , tolower(names(s)) ) ) )
df$Value <- sapply( seq_len(nrow(df)) , function(x) extract( s[[ df$layer[x] ]] , df[ x , 1:2 ] ) )

#  And the result (note the scale of Values is consistent with the scale of values in each rasterLayer in the stack)
df
#          x         y species layer Value
#1  0.4827577 0.7517476    lion     1     1
#2  0.8590993 0.9929104    lion     1     3
#3  0.8987446 0.4465397   tiger     3  1084
#4  0.5935572 0.6591223 panther     2   107
#5  0.6382287 0.1579990 panther     2   103
#6  0.7957626 0.7931233    lion     1     4
#7  0.2836228 0.3689158   tiger     3  1076
#8  0.5213569 0.7156062    lion     1     3
#9  0.6828245 0.1352709 panther     2   103
#10 0.7030304 0.8049597 panther     2   105
1
votes

Did you try to subset your RasterStack?

Something like this

for(i in 1: length(df.species.name)) #assuming it is the 'partial species name'
{
  result <- subset(s, grep(df.species.name[i], ignore.case = TRUE, value = TRUE)
}

It would be interesting to know how different raster and species names may be. This would allow better approaches, tunning regular expression if necessary. You'll find many references to grep here. Try ?grep too.