0
votes

I have an sf dataset with two geometry columns. Here's what it looks like:

> trip_geo

   dstid sourceid                    dest_geom                    source_geom
1    1        1   MULTIPOLYGON (((-2.607985 5... MULTIPOLYGON (((-2.607985 5...
2    1        2   MULTIPOLYGON (((-2.607985 5... MULTIPOLYGON (((-2.57022 51...
3    1        3   MULTIPOLYGON (((-2.607985 5... MULTIPOLYGON (((-2.593213 5...
4    1        4   MULTIPOLYGON (((-2.607985 5... MULTIPOLYGON (((-2.608686 5...
5    1        5   MULTIPOLYGON (((-2.607985 5... MULTIPOLYGON (((-2.512852 5...

The active geometry is dest_geom.

Each row corresponds to a trip between neighbourhoods. For each trip, I want to know which neighbourhoods lie in the way of the trip. That is, if you were to draw a straight line between source_geom and dest_geom for each row, which geometries would touch this straight line? I want to get all of the touching geometries for the row and then union them.

I have another dataset with the geometry corresponding to each id:

> id_geo

   dstid                       geometry
1      1 MULTIPOLYGON (((-2.607985 5...
2      2 MULTIPOLYGON (((-2.57022 51...
3      3 MULTIPOLYGON (((-2.593213 5...
4      4 MULTIPOLYGON (((-2.608686 5...
5      5 MULTIPOLYGON (((-2.512852 5...

I imagine that the first step would be to define a line between the centroids of source_geom and dest_geom for each trip. Then, create an sf where the geometry column contains a list of the polygons which touch the line (I don't know if it's possible to have several geometries in one column in sf). Then, union the geometries contained in the same row/list.

I don't think the way Im approaching the problem is correct because, to the best of my knowledge, one cannot carry out operations on two geometries of an sf, such as defining a line. Additionally, I don't know how I would integrate a list to a dataframe/sf.

If you could suggest a more realistic way of approaching the problem, I would greatly appreciate it.

2
Can you share some sample data? Try pasting the output of dput(dplyr::sample_n(trip_geo, 15)) as well as the rows in id_geo that correspond to the IDs in that sample of 15 rows from trip_geo. This way people can paste your data into R and try to provide a solution.Eugene Chong
dput for sf objects is very long. I'd try to break your analysis into smaller pieces and make simple dummy data that people could copy and paste to help you outastrofunkswag

2 Answers

3
votes

There's nothing wrong with having two geometries in an sf data frame, but only one of them can be the geometry when its used with functions that take an implicit geometry, eg st_centroid(foo) which gets the centroids of the set geometry.

For other geometries, you can work on that named column, for example st_centroid(foo$source_geom).

For a data frame nc with two polygon geometries you can compute the line between the centroids by first unioning the points to make a MULTIPOINT and then casting them to a LINESTRING. For example, for the first row:

> st_cast(st_union(c(st_centroid(nc$source_geom[1]), st_centroid(nc$dest_geom[1]))),"LINESTRING")
Geometry set for 1 feature 
geometry type:  LINESTRING
dimension:      XY
bbox:           xmin: -81.49826 ymin: 36.42228 xmax: -77.41056 ymax: 36.4314
epsg (SRID):    NA
proj4string:    NA
LINESTRING (-81.49826 36.4314, -77.41056 36.42228)

You have to do this row-wise otherwise you end up operating on the entire geometry vector.

Full example. Do library(spdep) and example(poly2nb) to get nc.sids.

First cut it down a bit to two columns and 5 random rows:

> nc = nc.sids[,c("NAME","FIPS")]
> nc = nc[sample(1:nrow(nc.sids),5),]
> nc
Simple feature collection with 5 features and 2 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -83.73952 ymin: 34.36442 xmax: -78.16968 ymax: 36.54607
epsg (SRID):    NA
proj4string:    NA
         NAME  FIPS                       geometry
82 Cumberland 37051 MULTIPOLYGON (((-78.49929 3...
96     Bladen 37017 MULTIPOLYGON (((-78.2615 34...
13  Granville 37077 MULTIPOLYGON (((-78.74912 3...
78      Macon 37113 MULTIPOLYGON (((-83.10629 3...
14     Person 37145 MULTIPOLYGON (((-78.8068 36...

Lets say feature 1 goes to feature 2, 2 to 3, etc. Create new geometry column:

> nc$dest_geom = nc$geometry[c(2,3,4,5,1)]

Now make the line connecting centroids:

> nc$join_geom = st_sfc(sapply(1:nrow(nc),function(i)st_cast(st_union(c(st_centroid(nc$geometry[i]), st_centroid(nc$dest_geom[i]))),"LINESTRING")))

Plot:

centroids joined

> plot(nc$geom)
> plot(nc$join_geom,add=TRUE,lty=2)
0
votes

The response from @Spacedman was perfect to draw lines between the polygons of interest. Below is the code to (1) draw the lines and (2) select the polygons which intersect each line and union the selection.

#Draw a line between each source and destination
library(sf)
trip_geo$join_geom = st_sfc(sapply(1:nrow(trip_geo),function(j)
+ st_cast(st_union(c(st_centroid(trip_geo$dest_geom[j]),
+ st_centroid(trip_geo$source_geom[j]))),"LINESTRING")), crs=4326)

#For each trip, select the polygons that intersect the line, then union them together
df=data.frame()
for (i in 1:nrow(trip_geo)){
    id_geo$touch=apply(st_intersects(id_geo, trip_geo$join_geom[i]), 1, any)
    touch=subset(id_geo, touch==T)
    touch=st_union(touch)
    df[i,1]=touch #append the unioned polygons to an empty dataframe
  }

#Add the unioned polygons to the original dataset
sf=st_sf(df, crs=4326) 
bound=cbind(trip_geo, sf)

The computations take some time (about 1 min for the line drawing and 1 min for the loop, with 3,500 polygons), and I'm sure my code could be improved.