I am trying to train (fit) a Random forest classifier using python and scikit-learn for a set of data stored as feature vectors. I can read the data, but I can't run the training of the classifier because of Value Erros. The source code that I am using is the following:
from sklearn.ensemble import RandomForestClassifier
from numpy import genfromtxt
my_training_data = genfromtxt('csv-data.txt', delimiter=',')
X_train = my_training_data[:,0]
Y_train = my_training_data[:,1:my_training_data.shape[1]]
clf = RandomForestClassifier(n_estimators=50)
clf = clf.fit(X_train.tolist(), Y_train.tolist())
The error returned to me is the following:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/sklearn/ensemble/forest.py", line 260, in fit
n_samples, self.n_features_ = X.shape
ValueError: need more than 1 value to unpack
The csv-data.txt is a comma separated values file, containing 3996 vectors for training of the classifier. I use the f irst dimension to label the vector and the rest are float values. These are the dimensions of the feature vectors used in the classifier.
Did I miss some conversion here?
X_train
andY_train
be swapped? – Matt Hancock