1
votes

I am very new to raster data and the use of R for spatial data analysis, so apologies if there's an obvious solution or process for this I've missed.

I have a raster file of population data from WorldPop, and a set of latitude / longitude location points that overlay onto that. I am trying to determine what portion of the population is (according to the WorldPop estimates) within a given distance of these points of interest, and also what portion is not.

I understand that using raster::extract, I should be able to get the sum of population values from (for example) a 1-kilometer buffer around each of these points. (Although my points and raster data are both in lat/lon projection, so I gather I need to first correct for this by changing the projection to utm as done here.)

However, because some number of these points will be less than 1 km apart, I am concerned that this total sum is double-counting the population of some cells where buffers overlap. Does buffering automatically correct for this, or is there an efficient way to ensure that this is not the case, and also to get the values from the inverse of the buffered point area selection?

2

2 Answers

1
votes

Please always include some example data in a minimal self-contained reproducible example. Say,

library(raster)
r <- raster(system.file("external/rlogo.grd", package="raster"))
d <- matrix(c(48, 48, 48, 53, 50, 46, 54, 70, 84, 85, 74, 84, 95, 85, 
   66, 42, 26, 4, 19, 17, 7, 14, 26, 29, 39, 45, 51, 56, 46, 38, 31, 
   22, 34, 60, 70, 73, 63, 46, 43, 28), ncol=2)
p <- SpatialPoints(d, proj4string=crs(r))     

A simple workflow, with points p and raster r would be

b <- buffer(p, 10)
m <- mask(r, b)
ms <- cellStats(m, "sum")
rs <- cellStats(r, "sum")
ms/rs
#[1] 0.4965083

Or you can use terra, to make this go faster, like this

library(terra)
r <- rast(system.file("ex/logo.tif", package="terra")) [[1]]
p <- vect(d, crs=crs(r))

b <- buffer(p, 10)
m <- mask(r, b)
ms <- global(m, "sum", na.rm=TRUE)
rs <- global(r, "sum")
ms/rs

By the way, with the raster package your assertion about needing to transform lon/lat data is not correct for extract or buffer. In contrast, with terra you need to do that (to be fixed).

-1
votes

Well, thanks to a suggestion via Twitter and this guide to creating SpatialPolygons around points, I've been able to find an answer for this. This is probably not the most efficient means of doing so — it's proving to be very slow on large polygons - but it's workable for my purposes.

Here's sample code:

library(tabularaster)
library(raster)
library(tidyverse)
library(geos)

# -----------------------

# load point data ---

p <- read_csv("points_of_interest.csv")
p_df <- p %>% rename(x = lat, y = lon)
p_coords <- p_df[, c("y","x")]

p_spdf <- SpatialPointsDataFrame(
   coords = pc_coords,
   data = p_df,
   proj4string = CRS("+init=epsg:4326"))

# convert projection to metric units

p_mrc <- spTransform(
   p_spdf,
   CRS("+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 
       +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs")
   )

# buffer to 1000 meters

p_mrc_1k_mrc <- gBuffer(
   p_mrc, byid = TRUE, width = 1000)

# switch back to lat/lon
p_mrc_1k <- spTransform(p_mrc_1k_mrc, CRS("+init=epsg:4326"))

# load raster data -------

r <- raster("pop.tif")
r_tib <- tabularaster::as_tibble(r)

# get intersection of cells and polygons
cell_df_1k <- cellnumbers(r, p_mrc_1k)

# get list of cells where there is intersection
target_cell_1k <- cell_df_1k$cell_

# add cell values to df listing all cells covered by polys
target_cells_extract_1k <- cell_df_1k %>%
  rename(cellindex = cell_) %>%
  left_join(r_tib)

# calculate the sum of population within 1k radius for each object 
# (this includes overlapping population cells shared between polys)
cell_sum_1k <- target_cells_extract_1k %>%
  group_by(object_) %>%
  summarize(pop_1k = sum(cellvalue, na.rm = T))

# get only unique cell values for total overlapping coverage of all polys
target_cells_unique_1k <- r_tib %>% filter(cellindex %in% target_cell_1k)

total_coverage_pop <- sum(target_cells_unique_1k$cellvalue, na.rm = T)
outside_coverage_pop <- sum(r_tib$cellvalue) - total_coverage_pop