0
votes

I have a raster grid at 0.5 degree resolution (r) and a dataframe (my_df) with 3 columns: long, lat and id. The dataframe represents species occurrence records.

What I want to do is determine which species are present in each 0.5 degree cell of my raster grid, and for each cell only keep 1 record of each species (my_df has more than 90,000,000 rows), so if a 0.5 degree cell only has one species, there will be a row with the lat, long of the raster grid cell and then the species ID from the dataframe. Other raster grid cells may contain hundreds of species, so may have hundreds of rows.

Ultimately I would like to create a dataframe that has long and lat of the 0.5 degree raster grid that each species location falls into and the species ID that are present there, one row for each species.

I have created a raster grid, as per...

ext <- extent(-180.0, 180, -90.0, 90.0)
gridsize <- 0.5
r <- raster(ext, res=gridsize)
crs(r) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

and a dataframe, which was originally a SpatialPolygonsDataframe...

A tibble: 6 x 3
   long   lat id   
  <dbl> <dbl> <chr>
1  16.5 -28.6 0    
2  16.5 -28.6 0    
3  16.5 -28.6 0    
4  16.5 -28.6 0    
5  16.5 -28.6 0    
6  16.5 -28.6 0 
etc
etc

...but am unsure of how to proceed with the rest of the method. I have tried rasterizing my data, extracting points etc but I am continually hitting errors and am unsure of the correct method to use to achieve my aim.

Alternatively, if anyone knows how to extract species names directly form the SpatialPolygonsDataFrame which contains a range polygon for each species, at 0.5 degree raster grid cell locations, that would be excellent.

Any help would be greatly appreciated.

2
You can visit this link, this and this.Bappa Das

2 Answers

1
votes

If I guessed correctly, you want to match points that fall within cells. I think you are looking for a spatial join based on interesection between points and polygons.

I highly recommend you to use sf package rather than sp objects. That's what I'm going to propose you.

First, create the grid with st_make_grid function

library(sf)
library(dplyr)

ext <- raster::extent(-180.0, 180, -90.0, 90.0)

grid <- st_bbox(ext) %>% 
  st_make_grid(cellsize = 0.5, what = "polygons") %>%
  st_set_crs(4326)
grid <- grid %>% st_sf() %>% mutate(id_cell = seq_len(nrow(.)))

Then let's take a simple dataframe:

df <- data.frame(long = 16.51, lat = -28.6, id = 0)
df <- df %>% sf::st_as_sf(coords = c("long","lat"), crs = 4326)

df

Simple feature collection with 1 feature and 1 field
geometry type:  POINT
dimension:      XY
bbox:           xmin: 16.51 ymin: -28.6 xmax: 16.51 ymax: -28.6
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs
  id            geometry
1  0 POINT (16.51 -28.6)

Then, you need to use st_join function. By default the spatial join is based on intersection:

df %>% sf::st_join(grid, left = TRUE)

although coordinates are longitude/latitude, st_intersects assumes that they are planar
Simple feature collection with 1 feature and 2 fields
geometry type:  POINT
dimension:      XY
bbox:           xmin: 16.51 ymin: -28.6 xmax: 16.51 ymax: -28.6
epsg (SRID):    4326
proj4string:    +proj=longlat +datum=WGS84 +no_defs
  id id_cell            geometry
1  0   88234 POINT (16.51 -28.6)

I assumed you wanted a left join (report all your points). You can change that option. I think using sf will be faster than a hand-coded technique.

0
votes

With point data you can do that like this

Example data

#species
set.seed(0)
n <- 20
spp <- data.frame(lon=runif(n, -180, 180), lat=runif(n,-90,90), sp=sample(5, n, replace=TRUE)) 

# raster
library(raster)
# for the example I use a resolution of 90, rather than 0.5 
r <- raster(res=90)

Now compute the cell number for each location and tabulate. The way I do it it returns the counts, rather than only presence/absence

spp$cell <- cellFromXY(r, spp[, c("lon", "lat")])
tb <- table(spp$cell, spp$sp)

To get the lon/lat for each cell

xy <- xyFromCell(r, as.integer(rownames(tb)))
result <- cbind(xy, tb)
colnames(result)[1:2] <- c("lon", "lat")
result
#   lon lat 1 2 3 4 5
#1 -135  45 0 0 1 0 0
#2  -45  45 0 2 1 0 0
#3   45  45 1 0 0 2 0
#4  135  45 0 1 0 0 1
#5 -135 -45 1 2 0 0 0
#6  -45 -45 0 1 0 1 0
#7   45 -45 1 1 0 0 0
#8  135 -45 1 0 1 2 0

With polygon data (and with point data as well) you could use raster::rasterize

Example polygon data

library(raster)
p1 <- rbind(c(-180,-20), c(-140,55), c(10, 0), c(-140,-60), c(-180,-20))
hole <- rbind(c(-150,-20), c(-100,-10), c(-110,20), c(-150,-20))
p1 <- list(p1, hole)
p2 <- rbind(c(-10,0), c(140,60), c(160,0), c(140,-55), c(-10,0))
p3 <- rbind(c(-125,0), c(0,60), c(40,5), c(15,-45), c(-125,0))
spp <- data.frame(species=letters[1:3], stringsAsFactors=FALSE)
pols <- spPolygons(p1, p2, p3, attr=spp)

Rasterize each species and combine in a RasterStack. If you have many species you want to assign a filename to the rasterize argument, such as filename = paste0("sp_", i, ".tif")

usp <- unique(spp$species)
r <- raster(res=0.5)
s <- list()
for (i in 1:length(usp)) {
    p <- pols[pols$species == usp[i], ]
    s[[i]] <- rasterize(p, r, field=1, fun="count")
}       
ss <- stack(s)

(for species richness do sr <- sum(ss>0, na.rm=TRUE))

Create the output you desire

m <- as.matrix(ss)
m[is.na(m)] <- 0
# to remove rows with no species 
i <- which(rowSums(m) > 0)
xy <- xyFromCell(r, i)  
output <- cbind(xy, m[i,])
colnames(output) <- c("lon", "lat", usp)
head(output)
#        lon   lat a b c
#[1,]  -0.25 59.75 0 0 1
#[2,] 139.75 59.75 0 1 0
#[3,]  -1.25 59.25 0 0 1
#[4,]  -0.75 59.25 0 0 1
#[5,]  -0.25 59.25 0 0 1
#[6,]   0.25 59.25 0 0 1