I am generating species distribution models using Random Forest. These models attempt to predict the probability of occurrence by a species, conditioned on various environmental attributes. For most species, our initial set of potential predictors is somewhere between 10 and 25, and each predictor is represented by a GIS raster file with 460,000,000 cells. Because of the nature of the training data, which I won't go into here, I am actually building multiple RF models (approximately 10 to 100+) based on subsets of the data, and then combining to create my overall model for each species. Actually building the model take relatively little time (a few minutes or less, generally), but using the predict function to produce a raster layer of predicted probability based on this model can take 20+ hours. I suspect that much of this lengthy process is due to reading/writing the large raster files, and that a bottleneck might be hard drive read/write speed.
To provide a little more detail... Once I have my trained model, I am creating a raster stack of the layers representing the predictor layers, via the raster package, and then predicting to that stack, using the predict() function in the raster package. I have a reasonably powerful desktop (Core i7, 3.5GHz, w/ 32 GB of RAM), and the input and output raster files are on the local hard drive, not moving over a network. I saw mbq's answer here with helpful suggestions on speeding up model generation with randomForest, and am looking for similar suggestions for speeding up the predict operation. I can think of a number of things that might help (e.g., growing a smaller number of trees, using one of the libraries for parallel processing), and I plan to test these as time permits, but it's unclear to me whether any of these will have a significant impact if the problem is mostly a read-write bottleneck. I would be grateful for any suggestions.