I'm running scikit-learn (version 0.15.2) Random Forest with python 3.4 in windows 7 64-bit. I have this very simple model:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
#Data=np.genfromtxt('C:/Data/Tests/Train.txt', delimiter=',')
print ("nrows = ", Data.shape[0], "ncols = ", Data.shape[1])
X=np.float32(Data[:,1:])
Y=np.int16(Data[:,0])
RF = RandomForestClassifier(n_estimators=1000)
RF.fit(X, Y)
The X dataset has about 30,000 x 500 elements of the following format:
139.2398242257808,310.7242684642465,...
Even with no parallel processing, the memory usage creeps up to 16 GB eventually! I'm wondering why there is so much memory usage.
I know this has been asked before sometime ago but before the 0.15.2 version...
Any suggestions?