Extract row data using a raster grid

Question

I have a raster grid at 0.5 degree resolution (r) and a dataframe (my_df) with 3 columns: long, lat and id. The dataframe represents species occurrence records.

What I want to do is determine which species are present in each 0.5 degree cell of my raster grid, and for each cell only keep 1 record of each species (my_df has more than 90,000,000 rows), so if a 0.5 degree cell only has one species, there will be a row with the lat, long of the raster grid cell and then the species ID from the dataframe. Other raster grid cells may contain hundreds of species, so may have hundreds of rows.

Ultimately I would like to create a dataframe that has long and lat of the 0.5 degree raster grid that each species location falls into and the species ID that are present there, one row for each species.

I have created a raster grid, as per...

ext <- extent(-180.0, 180, -90.0, 90.0)
gridsize <- 0.5
r <- raster(ext, res=gridsize)
crs(r) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

and a dataframe, which was originally a SpatialPolygonsDataframe...

A tibble: 6 x 3
   long   lat id   
  <dbl> <dbl> <chr>
1  16.5 -28.6 0    
2  16.5 -28.6 0    
3  16.5 -28.6 0    
4  16.5 -28.6 0    
5  16.5 -28.6 0    
6  16.5 -28.6 0 
etc
etc

...but am unsure of how to proceed with the rest of the method. I have tried rasterizing my data, extracting points etc but I am continually hitting errors and am unsure of the correct method to use to achieve my aim.

Alternatively, if anyone knows how to extract species names directly form the SpatialPolygonsDataFrame which contains a range polygon for each species, at 0.5 degree raster grid cell locations, that would be excellent.

Any help would be greatly appreciated.

linog linog · Accepted Answer · 2020-04-07T07:04:15

If I guessed correctly, you want to match points that fall within cells. I think you are looking for a spatial join based on interesection between points and polygons.

I highly recommend you to use sf package rather than sp objects. That's what I'm going to propose you.

First, create the grid with st_make_grid function

library(sf)
library(dplyr)

ext <- raster::extent(-180.0, 180, -90.0, 90.0)

grid <- st_bbox(ext) %>% 
  st_make_grid(cellsize = 0.5, what = "polygons") %>%
  st_set_crs(4326)
grid <- grid %>% st_sf() %>% mutate(id_cell = seq_len(nrow(.)))

Then let's take a simple dataframe:

df <- data.frame(long = 16.51, lat = -28.6, id = 0)
df <- df %>% sf::st_as_sf(coords = c("long","lat"), crs = 4326)

df

Simple feature collection with 1 feature and 1 field
geometry type:  POINT
dimension:      XY
bbox:           xmin: 16.51 ymin: -28.6 xmax: 16.51 ymax: -28.6
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs
  id            geometry
1  0 POINT (16.51 -28.6)

Then, you need to use st_join function. By default the spatial join is based on intersection:

df %>% sf::st_join(grid, left = TRUE)

although coordinates are longitude/latitude, st_intersects assumes that they are planar
Simple feature collection with 1 feature and 2 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: 16.51 ymin: -28.6 xmax: 16.51 ymax: -28.6
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs
  id id_cell            geometry
1  0   88234 POINT (16.51 -28.6)

I assumed you wanted a left join (report all your points). You can change that option. I think using sf will be faster than a hand-coded technique.

Extract row data using a raster grid

2 Answers