0
votes

If we have a raster, say of integer elevation data for a country, and one polygon shapefile, say discretizing 300 river basins in that country, with a unique name for each, how would we most easily get output like this for them all?

basinID, gridcellelev
a, 320
a, 321
a, 320
b, 17
b, 18
b, 19

The most burdensome way seems to be filtering/converting the single shapefile into 300 shapefiles, clipping the raster 300 times into 300 uniqueID rasters, reading them back in, generating individual tables for each basin, then rbind()ing them all together.

On the other hand the ideal way seems to be skipping the file generation, not saving the xy data, and creating the same table from just using one raster and one shapefile - by somehow selecting the cells within a basin, stamping them with the basinID, creating a table, losing the coordinates, and keep iterating and appending that table until the 300th basin.

I'm not looking for any statistics, just the raw data listing of the grid cell elevations that would have been part of some standard clip method. I believe the standard raster clip attribute table output that comes out of ArcMap is not the raw data but the counts/frequency of the cells. That would work too.

I don't know to minimally reproduce a raster and a polygon shapefile, so I'd just be grateful for any tips/libraries/functions/examples. As a starting point:

library(tidyverse)
library(raster)
library(rgdal)
library(sf)

elev_raster <- raster("spain_elev_meters.tif") #integer raster
basins <- readOGR("spainbasins.shp", "spainbasins") %>% st_as_sf() #unique basin ID column: `basinID` 

unless other starting formats are suggested.

Any tips greatly appreciated!

1

1 Answers

1
votes

I think you would find the answer if you did you do a search for "extract raster shapefile"

Example data:

library(raster)
p <- shapefile(system.file("external/lux.shp", package="raster"))
r <- raster(p)
values(r) <- 1:ncell(r)

p has 12 polygons

p
#class       : SpatialPolygonsDataFrame 
#features    : 12 
#extent      : 5.74414, 6.528252, 49.44781, 50.18162  (xmin, xmax, ymin, ymax)
#crs         : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
#variables   : 5
#names       : ID_1,     NAME_1, ID_2,   NAME_2, AREA 
#min values  :    1,   Diekirch,    1, Capellen,   76 
#max values  :    3, Luxembourg,   12,    Wiltz,  312

Solution 1. Extract the values and apply a function. For example mean

x <- extract(r, p, fun=mean, na.rm=TRUE)

Or like this

x <- extract(r, p)
v <- sapply(x, mean)

One value per polygon

v
#[1] 14.00000 43.40000 49.00000 36.00000 29.50000 59.00000 91.00000 71.83333
#[9] 73.50000 87.16667 78.50000 59.57143

Solution 2. To get the structure you asked for

x <- extract(r, p)
z <- do.call(rbind, lapply(1:length(x), function(i) cbind(i, x[[i]])))

Or like this

i <- sapply(x, length)
j <- rep(1:length(i), i)
z <- cbind(j, unlist(x))

colnames(z) = c("ID", "value")
head(z)
#     ID value
#[1,]  1     4
#[2,]  1     5
#[3,]  1    13
#[4,]  1    14
#[5,]  1    15
#[6,]  1    22
tail(z)
#      ID value
#[51,] 12    55
#[52,] 12    56
#[53,] 12    57
#[54,] 12    64
#[55,] 12    65
#[56,] 12    66

Later (see question in discussion)

To get frequencies, you can use the list of extracted values and the table function.

With new example values to get some variation in the frequencies

values(r) <- rep(1:4, 25)

Now do

f <- extract(r, p, fun=function(i,...) table(i)) 

or in two steps:

x <- extract(r, p) 
ff <- lapply(x, table)

You get a table for each layer

ff[1:2]
#[[1]]
#1 2 3 4 
#3 2 1 1 

#[[2]]
#1 2 3 4 
#1 1 2 1 

Another way to get this result is to use what I did above

x <- extract(r, p)
z <- do.call(rbind, lapply(1:length(x), function(i) cbind(i, x[[i]])))

Followed by

fff <- tapply(z[,2], z[,1], table)

If you want to manipulate these further, write a function for that and use it with lapply or a for loop.