0
votes

I have the following boundary dataset for the United Kingdom, which shows all the counties:

library(raster)
library(sp)
library(ggplot)

# Download the data
GB <- getData('GADM', country="gbr", level=2)

Using the subset function it is really easy to filter the shapefile polygons by an attribute in the data. For example, if I want to exclude Northern Ireland:

GB_sub <- subset(UK, NAME_1 != "Northern Ireland")

However, there are lots of small islands which distort the scale data range, as shown in the maps below:

enter image description here

Any thoughts on how to elegantly subset the dataset on a minimum size? It would be ideal to have something in the format consistent with the subset argument. For example:

GB_sub <- subset(UK, Area > 20) # specify minimum area in km^2
2
You can crop it with another raster of Great Britain mainland. This post link could be helpful.patL

2 Answers

1
votes

Here is another potential solution. Because your data is in lat-long projection, directly calculating the area based on latitude and longitude would cause bias, it is better to calculate the area based on functions from the geosphere package.

install.packages("geosphere")
library(geosphere)

# Calculate the area
GB$poly_area <- areaPolygon(GB) / 10^6

# Filter GB based on area > 20 km2
GB_filter <- subset(GB, poly_area > 20)

poly_area contains the area in km2 for all polygons. We can filter the polygon by a threshold, such as 20 in your example. GB_filter is the final output.

1
votes

This is one potential solution:

GB_sub = GB[sapply(GB@polygons, function(x) x@area>0.04),] # select min size
map.df <- fortify(GB_sub)
ggplot(map.df, aes(x=long, y=lat, group=group)) + geom_polygon()

Check this link for specifics on the actual interpretation of km2 size: Getting a slot's value of S4 objects?

I compared both as well but they don't seem to differ:

out1 = sapply(GB@polygons, function(x) x@area)
out2 = rgeos::gArea(GB, byid=TRUE)