4
votes

I have trouble getting the intersection between two large SpatialPolygonsDataFrame on R. My polygons data represent buildings and administrative boundaries, and I am trying to get the intersection polygons between them.

I understand that the intersect function from the raster package and gIntersection from the rgeos package can do this job (with a few differences) but they cannot handle all my polygons at once (about 50.000 polygons/entity).

For this reason, I have to split my calculation within a loop, saving the result for each step. The problem is: these functions keep filling my physical memory, and I cannot clean it. I tried using rm() and gc(), but it does not change a thing. The memory issue crashes my R session, and I cannot do my calculation.

Is there a way to free the RAM during simulation, within loops ? Or to avoid this memory issue ?

Here comes a reproducible example, for random polygons.

library(raster)
library(sp)
library(rgeos)

#Generating 50000 points (for smaller polygons) and 150000 (for larger polygons) in a square of side 100000
size=100000

Nb_points1=50000
Nb_points2=150000
start_point=matrix(c(sample(x = 1:size,size = Nb_points1,replace = T),sample(x = 1:size,size = Nb_points1,replace = T)),ncol=2)
start_point2=matrix(c(sample(x = 1:size,size = Nb_points2,replace = T),sample(x = 1:size,size = Nb_points2,replace = T)),ncol=2)

#Defining different sides length
radius=sample(x = 1:50,size = Nb_points1,replace = T)
radius2=sample(x = 1:150,size = Nb_points2,replace = T)

#Generating list of polygons coordinates
coords=list()
for(y in 1:Nb_points1){
  xmin=max(0,start_point[y,1]-radius[y])
  xmax=min(size,start_point[y,1]+radius[y])
  ymin=max(0,start_point[y,2]-radius[y])
  ymax=min(size,start_point[y,2]+radius[y])
  coords[[y]]=matrix(c(xmin,xmin,xmax,xmax,ymin,ymax,ymax,ymin),ncol=2)
}

coords2=list()
for(y in 1:Nb_points2){
  xmin=max(0,start_point2[y,1]-radius2[y])
  xmax=min(size,start_point2[y,1]+radius2[y])
  ymin=max(0,start_point2[y,2]-radius2[y])
  ymax=min(size,start_point2[y,2]+radius2[y])
  coords2[[y]]=matrix(c(xmin,xmin,xmax,xmax,ymin,ymax,ymax,ymin),ncol=2)
}

#Generating 75000 polygons
Poly=SpatialPolygons(Srl = lapply(1:Nb_points1,function(y) Polygons(srl = list(Polygon(coords=coords[y],hole = F)),ID = y)),proj4string = CRS('+init=epsg:2154'))
Poly2=SpatialPolygons(Srl = lapply(1:Nb_points2,function(y)Polygons(srl =  list(Polygon(coords=coords2[y],hole = F)),ID = y)),proj4string = CRS('+init=epsg:2154'))

#Union of overlapping polygons
aaa=gUnionCascaded(Poly)
bbb=gUnionCascaded(Poly2)

aaa=disaggregate(aaa)
bbb=disaggregate(bbb)

intersection=gIntersects(spgeom1 = aaa,bbb,byid = T,returnDense = F)

#Loop on the intersect function
pb <- txtProgressBar(min = 0, max = ceiling(length(aaa)/1000), style = 3)

for(j in 1:ceiling(length(aaa)/1000)){
  tmp_aaa=aaa[((j-1)*1000+1):(j*1000),]
  tmp_bbb=bbb[unique(unlist(intersection[((j-1)*1000+1):(j*1000)])),]
  List_inter=intersect(tmp_aaa,tmp_bbb)
  gc()
  gc()
  gc()
  setTxtProgressBar(pb, j)
}

Thank you !

2
To avoid memory issues you could switch to gdalUtils.loki
I don't know this package. Could you help me with it ? What function can help me ? I don't see anything about memory or intersection.A.Rogeau
gdalUtils is a very good and useful package but it won't help here. It's mostly to play with rasters. You use the raster package, but not on rasters so I doubt it would help.Bastien
R isn't that efficient for big GIS stuff. I often prefer using R as a base to call other software. For that, RSAGA is my favorite follwed by RQGIS and than more complicated RGRASS7. All need you to install the appropriate software (can be done one shot with OSGEO4W). They should succeed at your task. I'm kind of busy right now, if I have chance later on I'll post an example.Bastien

2 Answers

3
votes

You can consider using the st_intersects and st_intersection functions of package sf. For example:

aaa2 <- sf::st_as_sf(aaa)
bbb2 <- sf::st_as_sf(bbb)
intersections_mat <- sf::st_intersects(aaa2, bbb2)
intersections <- list()
for (int in seq_along(intersections_mat)){
  if (length(intersections_mat[[int]]) != 0){
    intersections[[int]] <- sf::st_intersection(aaa2[int,], 
    bbb2[intersections_mat[[int]],])
  }
}

will give you an intersection_mat of length equal to aaa, and containing , for each feature of aaa, the "indexes" of bbb elements with which it intersects ("empty" if no intersection found):

> intersections_mat
Sparse geometry binary predicate list of length 48503, where the predicate was `intersects'
first 10 elements:
 1: 562
 2: (empty)
 3: 571
 4: 731
 5: (empty)
 6: (empty)
 7: (empty)
 8: 589
 9: 715
 10: (empty)

, and an intersection list containing the list of intersecting polygons:

>head(intersections)
[[1]]
Simple feature collection with 1 feature and 0 fields
geometry type:  POLYGON
dimension:      XY
bbox:           xmin: 98873 ymin: 33 xmax: 98946 ymax: 98
epsg (SRID):    2154
proj4string:    +proj=lcc +lat_1=49 +lat_2=44 +lat_0=46.5 +lon_0=3 +x_0=700000 +y_0=6600000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
                        geometry
1 POLYGON ((98873 33, 98873 9...

[[2]]
NULL

[[3]]
Simple feature collection with 1 feature and 0 fields
geometry type:  POLYGON
dimension:      XY
bbox:           xmin: 11792 ymin: 3 xmax: 11806 ymax: 17
epsg (SRID):    2154
proj4string:    +proj=lcc +lat_1=49 +lat_2=44 +lat_0=46.5 +lon_0=3 +x_0=700000 +y_0=6600000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
                        geometry
1 POLYGON ((11792 3, 11792 17...

(i.e., intersections[[1]] is the intersection between polygon 1 of aaa and polygon 571 of bbb)

HTH.

1
votes

The example works fine for me (8 GB RAM), after a few changes to the loop. See below. Tese changes are not related to memory use --- you were not storing the results.

List_inter <- list()

for(j in 1:ceiling(length(aaa)/1000)){
    begin <- (j-1) * 1000 + 1
    end <- min((j*1000), length(aaa))
    tmp_aaa <- aaa[begin:end,]
    tmp_bbb <- bbb[unique(unlist(intersection[begin:end])),]
    List_inter[[j]] <- intersect(tmp_aaa,tmp_bbb)
    cat(j, "\n"); flush.console()
}

x <- do.call(bind, List_inter)

Alternatively, you could write the intermediate results to disk, and deal with them later:

inters <- intersect(tmp_aaa,tmp_bbb)
saveRDS(inters, paste0(j, '.rds'))

Or

shapefile(inters, paste0(j, '.shp'))