1
votes

I need to load multiple shape files into my R session. Currently, I am loading each shape file individually. This works, but it takes a long time and only uses 15% of my available CPU. Recently, I tried loading the shape files using ForEach and DoParallel:

require(foreach)
require(doParallel)
require(rgdal)
files <- c(
    "AHVENANMAA/AHVENANMAA", 
    "ETELA-KARJALA/ETELA-KARJALA", 
    "ETELA-POHJANMAA/ETELA-POHJANMAA_1", 
    "ETELA-POHJANMAA/ETELA-POHJANMAA_2", 
    "ETELA-SAVO/ETELA-SAVO_1"
)
registerDoParallel(cores = 8)
listOfCurrentProvinces <- (
    foreach(
        x = files, 
        .packages = "rgdal", 
        .inorder = FALSE
    ) %dopar% 
        readOGR(x, layer = "DR_LINKKI")
)

This method works and is very fast (it uses 100% of my CPU). However, it uses up too much memory, especially when I repeat the process many times. Is there any way I can use ForEach and DoParallel without incurring such a major memory hit? My machine has 8 processors (4 physical and 4 logical) and has 16 GB of RAM.

1
Read THIS989

1 Answers

1
votes

I've done a couple things that seem to help.

1) Only register the number of cores ONCE using the registerDoParallel command.

2) Use gc() after each iteration.

Before I implemented these changes, my memory blew up after only 4 (of 21) iterations. Now, I've completed 6 iterations and am sitting comfortably at 50% RAM use. The amount of available RAM has remained constant for about 15 minutes.