Raster data to polygons to compute averages in R

Question

I am interested in calculating precipitation averages globally. My data is contained in a RasterStack object in R and is globally distributed precipitation data. However, I only want to isolate land areas to compute the mean of those separately. What I would like to do is somehow isolate those grid cells whose centers overlap with land and then compute the annual mean. I already first created a raster stack object, called "RCP1pctCO2Mean", which contains the mean values of interest but is globally distributed data. There are 138 layers, with each layer representing one year.

Thus, what I would like to do is the following:

Convert raster to polygon but retain the original 138 raster layers when converted
Specify which grid cells are land (i.e. those grid cell centers/coordinates that fall on land)
Compute annual precipitation averages across these land grid cells.

My raster stack object "RCP1pctCO2Mean" has the following attributes:

class       : RasterStack 
dimensions  : 64, 128, 8192, 138  (nrow, ncol, ncell, nlayers)
resolution  : 2.8125, 2.789327  (x, y)
extent      : -181.4062, 178.5938, -89.25846, 89.25846  (xmin, xmax, ymin,  
ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
names       :    layer.1,    layer.2,    layer.3,    layer.4,    layer.5,       
layer.6,    layer.7,    layer.8,    layer.9,   layer.10,   layer.11,   
layer.12,   layer.13,   layer.14,   layer.15, ... 
min values  : 0.42964514, 0.43375653, 0.51749371, 0.50838983, 0.45366730,    
0.53099146, 0.49757186, 0.45697752, 0.41382199, 0.46082401, 0.45516687, 
0.51857087, 0.41005131, 0.45956529, 0.47497867, ... 
max values  :   96.30350,  104.08584,   88.92751,   97.49373,   89.57201,   
90.58570,   86.67651,   88.33519,   96.94720,  101.58247,   96.07792,   
93.21948,   99.59785,   94.26218,   90.62138, ...

Previously, I tried isolating a specific region by specifying a range of longitudes and latitudes to obtain the means and medians for that region, just like this:

>expansion1<-expand.grid(103:120, 3:15) #This is a range of longitudes and then latitudes
>lonlataaa<-extract(RCP1pctCO2Mean, expansion1)
>Columnaaa<-colMeans(lonlataaa)

#Packages loaded        
library(raster)
library(maps)
library(maptools)
library(rasterVis)

However, with this above approach, too much water can mix with land areas, and if I narrow the latitude/longitude range on land, I might miss too much land to compute the mean meaningfully.

Therefore, with this RasterStack, would it be possible to create a condition that tells R that if the "center point" or centroid of each grid cell (with each grid cell center representing a specific latitude/longitude coordinate) happens to fall on land, then it would be considered as land (i.e. that would be TRUE - if not, then FALSE, or maybe somehow use 0s or 1s)? Even if a grid cell happens to have water mixed with land, but the center point/centroid of the grid is on land, that would be considered as land. I would like to do this for specific countries, too, to compute national averages the same way.

I want the individual 138 layers/years to be retained, so that all the Year 1s can be averaged across all relevant grid cells, then all Year 2s, then all Year 3s, then all Year 4s, etc. (to create a time series later). I'm not sure if this is the correct way to do this, but what I did first was take the "RCP1pctCO2Mean" RasterStack and created a SpatialPolygonsDataframe using:

trans <- raster::rasterToPolygon(RCP1pctCO2Mean)    
trans

class      : SpatialPolygonsDataFrame 
features    : 8192 
extent      : -181.4062, 178.5938, -89.25846, 89.25846  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
variables  : 138
names      :          layer.1,          layer.2,          layer.3,          layer.4,          layer.5,          layer.6,          layer.7,          layer.8,          layer.9,          layer.10,        layer.11,          layer.12,          layer.13,          layer.14,          layer.15, ... 
min values  : 0.429645141288708, 0.433756527757047, 0.517493712584313, 0.508389826053265, 0.453667300300907, 0.530991463885754,  0.4975718597839, 0.456977516231847, 0.413821991744321, 0.460824014230889, 0.45516687008843, 0.518570869929649, 0.410051312472821, 0.459565291388527, 0.474978673081429, ... 
max values  :  96.3034965348338,  104.085840813594,  88.9275127089197,  97.4937326695693,  89.5720053000712,  90.5857030396531, 86.6765123781949,  88.3351859796546,  96.947199473011,  101.582468961459, 96.0779212204739,  93.2194802269814,  99.5978503841538,  94.2621847475029,  90.6213755054263, ...

Does this first step make sense? Does this retain the original raster data, but now in polygon form? If so, is it then possible to somehow just isolate an all-land polygon, and then somehow specify which cells are considered land, and then compute averages for each year across the grid cells?

I have looked previously for any procedure example to do this, but I have not yet come across anything too specific.

Thank you for your time, and any help would be immensely valuable!

drJones drJones · Accepted Answer · 2019-11-15T02:30:41

You do not need to convert your raster to a polygon. It is better to use a shapefile to mask your raster and leave only the points that are overland. If your precipitation data is already in the format of a rasterStack called "RCP1pctCO2Mean" then you can do this:

require(maptools)
require(raster)
data(wrld_simpl) ##this loads a coarse-resolution global shapefile from the maptools package


##GLOBAL EXAMPLE
##isolate only points on land
##wrld_simpl and RCP1pctCO2Mean have the same CRS so can be used directly
all_land_vals = mask(RCP1pctCO2Mean,wrld_simpl)

##get mean value for all points but for each layer separately 
cellStats(all_land_vals, stat='mean')


##COUNTRY EXAMPLE
##isolate only points on Brazil
country=wrld_simpl[which(wrld_simpl$NAME=="Brazil"),]
country_vals = mask(RCP1pctCO2Mean,country)

##get mean value for all points in given country but for each layer separately 
cellStats(country_vals, stat='mean')

N.B. your precipitation data is quite coarse so you may want to disaggregate it first. Also you methodology of calculating the mean directly is flawed as latlong is a geographic coordinate system and the further away from the equator a pixel is the smaller the land area it actually represent. To overcome this technically precipitation values should be weighted to reflect distance from equator.

Raster data to polygons to compute averages in R

1 Answers