Might be missing something here, but didn't find anything after a bunch of digging. I am attempting to find rows of sf
objects where the geometry takes a certain value. This is for tidying purposes where the same geometry may have been stored with different associated metadata (such as id and other values) in different datasets, and I need to resolve discrepancies.
Normally it is easy to filter for certain values using dplyr
like so:
Load required packages
library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag(): dplyr, stats
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.0, proj.4 4.9.3
Load example dataset
nc <- st_read(system.file("shape/nc.shp", package="sf"))
#> Reading layer `nc' from data source `C:\Users\Calum You\Documents\R\win-library\3.4\sf\shape\nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID): 4267
#> proj4string: +proj=longlat +datum=NAD27 +no_defs
As expected, dplyr::filter
works well on different columns. We can easily choose the row with NAME
Ashe and CNTY_ID
1825, which is the first row:
nc %>% filter(NAME == "Ashe")
#> Simple feature collection with 1 feature and 14 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -81.74107 ymin: 36.23436 xmax: -81.23989 ymax: 36.58965
#> epsg (SRID): 4267
#> proj4string: +proj=longlat +datum=NAD27 +no_defs
#> AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74
#> 1 0.114 1.442 1825 1825 Ashe 37009 37009 5 1091 1
#> NWBIR74 BIR79 SID79 NWBIR79 geometry
#> 1 10 1364 0 19 MULTIPOLYGON (((-81.4727554...
nc %>% filter(CNTY_ID == 1825)
#> Simple feature collection with 1 feature and 14 fields
#> geometry type: MULTIPOLYGON
#> dimension: XY
#> bbox: xmin: -81.74107 ymin: 36.23436 xmax: -81.23989 ymax: 36.58965
#> epsg (SRID): 4267
#> proj4string: +proj=longlat +datum=NAD27 +no_defs
#> AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74
#> 1 0.114 1.442 1825 1825 Ashe 37009 37009 5 1091 1
#> NWBIR74 BIR79 SID79 NWBIR79 geometry
#> 1 10 1364 0 19 MULTIPOLYGON (((-81.4727554...
Does not work well when attempting to filter on the geometry column. I would hope that this returned only the first row of nc
, since that is the row of nc
where geometry
is equal to (by definition) the first geometry `nc$geometry[1]'.
nc %>% filter(geometry == nc$geometry[1])
#> Error in filter_impl(.data, quo): Evaluation error: operation == not supported.
nc %>% filter(geometry == st_geometry(nc)[1])
#> Error in filter_impl(.data, quo): Evaluation error: operation == not supported.
The error message suggests that even the ==
operator does not work:
nc$geometry[1] == nc$geometry[1]
#> Error in Ops.sfc(nc$geometry[1], nc$geometry[1]): operation == not supported
identical()
works, but not for the purposes of constructing my own filter. Would want to see TRUE
for the first element of the output, but I don't think identical()
works like that in the map functions.
identical(nc$geom[1], nc$geom[1])
#> [1] TRUE
map_lgl(nc$geometry, function(x) identical(x, nc$geometry[1]))
#> [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [34] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [45] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [56] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [67] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [78] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [89] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [100] FALSE
Is ==
simply not defined for geometries in sf
? If so, are there alternatives for testing equality or otherwise finding geometries that are the same in different datasets? I am near to resorting to rbind
different datasets together and use duplicated()
to do this, but that would require ensuring that all columns in both are the same and seems like unexpected/needless hassle.
Thanks for making it to the end of the question - any suggestions greatly appreciated!
nc2 <- rbind(nc, nc); map_lgl(nc2$geometry, ~ identical(., nc2$geometry[[1]])) # work; nc2 %>% filter(map_lgl(nc2$geometry, ~ identical(., nc2$geometry[[1]]))) # work; nc2 %>% filter(map_lgl(geometry, ~ identical(., geometry[[1]]))) # not work
. It's a bit strange. – cuttlefish44