I have a raster grid at 0.5 degree resolution (r) and a dataframe (my_df) with 3 columns: long, lat and id. The dataframe represents species occurrence records.
What I want to do is determine which species are present in each 0.5 degree cell of my raster grid, and for each cell only keep 1 record of each species (my_df has more than 90,000,000 rows), so if a 0.5 degree cell only has one species, there will be a row with the lat, long of the raster grid cell and then the species ID from the dataframe. Other raster grid cells may contain hundreds of species, so may have hundreds of rows.
Ultimately I would like to create a dataframe that has long and lat of the 0.5 degree raster grid that each species location falls into and the species ID that are present there, one row for each species.
I have created a raster grid, as per...
ext <- extent(-180.0, 180, -90.0, 90.0)
gridsize <- 0.5
r <- raster(ext, res=gridsize)
crs(r) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
and a dataframe, which was originally a SpatialPolygonsDataframe...
A tibble: 6 x 3
long lat id
<dbl> <dbl> <chr>
1 16.5 -28.6 0
2 16.5 -28.6 0
3 16.5 -28.6 0
4 16.5 -28.6 0
5 16.5 -28.6 0
6 16.5 -28.6 0
etc
etc
...but am unsure of how to proceed with the rest of the method. I have tried rasterizing my data, extracting points etc but I am continually hitting errors and am unsure of the correct method to use to achieve my aim.
Alternatively, if anyone knows how to extract species names directly form the SpatialPolygonsDataFrame which contains a range polygon for each species, at 0.5 degree raster grid cell locations, that would be excellent.
Any help would be greatly appreciated.