I'm working with a large data set (41,000 observations and 22 predictor variables) and trying to fit a Random Forest model using this code:
model <- randomForest(as.factor(data$usvsa) ~ ., ntree=1000, importance=TRUE, + proximity=TRUE, data=data).
I am running into the following error:
Error: cannot allocate vector of size 12.7 Gb
In addition: Warning messages:
1: In matrix(0, n, n) :
Reached total allocation of 6019Mb: see help(memory.size)
2: In matrix(0, n, n) :
Reached total allocation of 6019Mb: see help(memory.size)
3: In matrix(0, n, n) :
Reached total allocation of 6019Mb: see help(memory.size)
4: In matrix(0, n, n) :
Reached total allocation of 6019Mb: see help(memory.size)
I have done some reading in the R help on memory limits and on this site and am thinking that I need to buy 12+ GB of RAM since my memoryLimit is already set to about 6GB of RAM (my computer only has 6 GB of RAM). But first I wanted to double check that this is the only solution. I am running a windows 7 with a 64 bit processor and 6GB of RAM. Here is the R sessionInfo:
sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] randomForest_4.6-7
loaded via a namespace (and not attached):
[1] tools_2.15.3
Any tips?
randomForeston data that size with relatively little trouble, I would think. But the utility of the proximity matrix might require it being built on the whole data. So, do you really need the proximities...? - joran