2
votes

Might be missing something here, but didn't find anything after a bunch of digging. I am attempting to find rows of sf objects where the geometry takes a certain value. This is for tidying purposes where the same geometry may have been stored with different associated metadata (such as id and other values) in different datasets, and I need to resolve discrepancies.

Normally it is easy to filter for certain values using dplyr like so:

Load required packages

library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag():    dplyr, stats
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.0, proj.4 4.9.3

Load example dataset

nc <- st_read(system.file("shape/nc.shp", package="sf"))
#> Reading layer `nc' from data source `C:\Users\Calum You\Documents\R\win-library\3.4\sf\shape\nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs

As expected, dplyr::filter works well on different columns. We can easily choose the row with NAME Ashe and CNTY_ID 1825, which is the first row:

nc %>% filter(NAME == "Ashe")
#> Simple feature collection with 1 feature and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -81.74107 ymin: 36.23436 xmax: -81.23989 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs
#>    AREA PERIMETER CNTY_ CNTY_ID NAME  FIPS FIPSNO CRESS_ID BIR74 SID74
#> 1 0.114     1.442  1825    1825 Ashe 37009  37009        5  1091     1
#>   NWBIR74 BIR79 SID79 NWBIR79                       geometry
#> 1      10  1364     0      19 MULTIPOLYGON (((-81.4727554...
nc %>% filter(CNTY_ID == 1825)
#> Simple feature collection with 1 feature and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -81.74107 ymin: 36.23436 xmax: -81.23989 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs
#>    AREA PERIMETER CNTY_ CNTY_ID NAME  FIPS FIPSNO CRESS_ID BIR74 SID74
#> 1 0.114     1.442  1825    1825 Ashe 37009  37009        5  1091     1
#>   NWBIR74 BIR79 SID79 NWBIR79                       geometry
#> 1      10  1364     0      19 MULTIPOLYGON (((-81.4727554...

Does not work well when attempting to filter on the geometry column. I would hope that this returned only the first row of nc, since that is the row of nc where geometry is equal to (by definition) the first geometry `nc$geometry[1]'.

nc %>% filter(geometry == nc$geometry[1])
#> Error in filter_impl(.data, quo): Evaluation error: operation == not supported.
nc %>% filter(geometry == st_geometry(nc)[1])
#> Error in filter_impl(.data, quo): Evaluation error: operation == not supported.

The error message suggests that even the == operator does not work:

nc$geometry[1] == nc$geometry[1]
#> Error in Ops.sfc(nc$geometry[1], nc$geometry[1]): operation == not supported

identical() works, but not for the purposes of constructing my own filter. Would want to see TRUE for the first element of the output, but I don't think identical() works like that in the map functions.

identical(nc$geom[1], nc$geom[1])
#> [1] TRUE
map_lgl(nc$geometry, function(x) identical(x, nc$geometry[1]))
#>   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [34] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [45] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [56] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [67] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [78] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#>  [89] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [100] FALSE

Is == simply not defined for geometries in sf? If so, are there alternatives for testing equality or otherwise finding geometries that are the same in different datasets? I am near to resorting to rbind different datasets together and use duplicated() to do this, but that would require ensuring that all columns in both are the same and seems like unexpected/needless hassle.

Thanks for making it to the end of the question - any suggestions greatly appreciated!

1
nc2 <- rbind(nc, nc); map_lgl(nc2$geometry, ~ identical(., nc2$geometry[[1]])) # work; nc2 %>% filter(map_lgl(nc2$geometry, ~ identical(., nc2$geometry[[1]]))) # work; nc2 %>% filter(map_lgl(geometry, ~ identical(., geometry[[1]]))) # not work. It's a bit strange.cuttlefish44

1 Answers

2
votes

It appears that == is indeed not defined for sf objects. However, identical() works for this purpose if you remember to drill down in the subsetting, as noted by cuttlefish44. This, for example, works fine:

identical(nc$geom[[1]], nc$geom[[1]])
map_lgl(nc$geometry, function(x) identical(x, nc$geometry[[1]]))
  [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [22] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [43] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [64] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

I suppose this is to do with confusion on my part about how sf stores geometries: the sfg is not actually a list but contains the list of points for the geometry. Glad to have figured a workaround, regardless.

EDIT: I did not realise that st_equals is implemented in sf. It has a different syntax but is probably the best tool for this purpose, since it describes area equality rather than data structure equality.