Using R intersections to create a polygons-inside-a-polygon key using two shapefile layers

Question

The data

I have two shapefiles marking the boundaries of national and provincial electoral constituencies in Pakistan.

The objective

I am attempting to use R to create a key that will generate a list of which provincial-level constituencies are "contained within" or otherwise intersecting with which national-level constituencies, based on their coordinates in this data. For example, NA-01 corresponds with PA-01, PA-02, PA-03; NA-02 corresponds with PA-04 and PA-05, etc. (The key will ultimately be used to link separate dataframes containing electoral results at the national and provincial level; that part I've figured out.)

I have only basic/intermediate R skills learned largely through trial and error and no experience working with GIS data outside of R.

The attempted solution

The closest solution I could find for this problem comes from this guide to calculating intersection areas in R. However, I have been unable to successfully replicate any of the three proposed approaches (either the questioner's use of a general TRUE/FALSE report on intersections, or the more precise calculations of area of overlap).

The code

# import map files

NA_map <- readOGR(dsn = "./National_Constituency_Boundary", layer = "National_Constituency_Boundary")
PA_map <- readOGR(dsn = "./Provincial_Constituency_Boundary", layer = "Provincial_Constituency_Boundary")

# Both are now SpatialPolygonsDataFrame objects of 273 and 577 elements, respectively.
# If relevant, I used spdpylr to tweak some of data attribute names (for use later when joining to electoral dataframes):

NA_map <- NA_map %>% 
rename(constituency_number = NA_Cons,
     district_name = District,
     province = Province)

PA_map <- PA_map %>%
rename(province = PROVINCE,
     district_name = DISTRICT,
     constituency_number = PA)

# calculate intersections, take one

Results <- gIntersects(NA_map, PA_map, byid = TRUE)
# this creates a large matrix of 157,521 elements

rownames(Results) <- NA_map@data$constituency_number
colnames(Results) <- PA_map@data$constituency_number

Attempting to add the rowname/colname labels, however, gives me the error message:

Error in dimnames(x) <- dn : 
  length of 'dimnames' [1] not equal to array extent

Without the rowname/colname labels, I'm unable to read the overlay matrix, and unsure how to filter them so as to produce a list of only TRUE intersections that would help make a NA-PA key.

I also attempted to replicate the other two proposed solutions for calculating exact area of overlap:

# calculate intersections, take two

pi <- intersect(NA_map, PA_map)
# this generates a SpatialPolygons object with 273 elements

areas <- data.frame(area=sapply(pi@polygons, FUN = function(x) {slot(x, 'area')}))
# this calculates the area of intersection but has no other variables
row.names(areas) <- sapply(pi@polygons, FUN=function(x) {slot(x, 'ID')})

This generates the error message:

Error in `row.names<-.data.frame`(`*tmp*`, value = c("2", "1", "4", "5",  : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘1’

So that when I attempt to attach areas to attributes info with

attArrea <- spCbind(pi, areas)

I get the error message

Error in spCbind(pi, areas) : row names not identical

Attempting the third proposed method:

# calculate intersections, take three
pi <- st_intersection(NA_map, PA_map)

Produces the error message:

Error in UseMethod("st_intersection") : 
  no applicable method for 'st_intersection' applied to an object of class "c('SpatialPolygonsDataFrame', 'SpatialPolygons', 'Spatial', 'SpatialPolygonsNULL', 'SpatialVector')"

I understand that my SPDF maps can't be used for this third approach, but wasn't clear from the description what steps would be needed to transform it and attempt this method.

The plea for help

Any suggestions on corrections necessary to use any of these approaches, or pointers towards some other method of figuring this, would be greatly appreciated. Thanks!

Robert Hijmans Robert Hijmans · Accepted Answer · 2017-12-02T19:40:40

Here is some example data

library(raster)
p <- shapefile(system.file("external/lux.shp", package="raster"))
p1 <- aggregate(p, by="NAME_1")
p2 <- p[, 'NAME_2']

So we have p1 with regions, and p2 with lower level divisions.

Now we can do

x <- intersect(p1, p2)
# or  x <- union(p1, p2)
data.frame(x)

Which should be (and is) the same as the original

data.frame(p)[, c('NAME_1', 'NAME_2')]

To get the area of the polygons, you can do

 x$area <- area(x) / 1000000  # divide to get km2

There are likely to be many "slivers", very small polygons because of slight variations in borders. That might not matter to you.

But another approach could be matching by centroid:

y <- p2
e <- extract(p1, coordinates(p2))
y$NAME_1 <- e$NAME_1
data.frame(y)

Using R intersections to create a polygons-inside-a-polygon key using two shapefile layers

2 Answers