0
votes

I am new to programming and I am solving a Machine Learning problem in python, I tried to split my dataset into training and test as the code shows and I had the following error that I can not overcome even with some searches on google and other sites:

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
#Load up the training dataset
df = pd.read_excel('Trainind data_2002.xls')
df.head()
df['training'] = np.random.uniform(0, 1, len(df)) <= .70
colsfeatures = ['c2', 'c3', 'c4', 'c5', 'c7', 'ndvi', 'vi7']
colclass = ['class']
train, test = df[df['training'] == True, df['training'] == False]
trainingMatrix = train.as_matrix(colsfeatures)
classMatrix = train.as_matrix(colclass)

rfc = RandomForestClassifier(n_estimators=100, n_jobs=2)
rfc.fit(traningMatrix, classMatrix)
testMatrix = test.as_matrix(colsfeatures)
result = rfc.predict(testMatrix)
test['predictions'] = result
test.head()

Error: TypeError: 'Series' objects are mutable, thus they cannot be hashed

Please, who can help me, I would be grateful.

1

1 Answers

0
votes

have you tried train_test_split ?

from sklearn.model_selection import train_test_split
train , test = train_test_split(<<your data set >> , test_size = << ex : 0.2>>)