I have been trying this:
- Create X features and y dependent from a dataset
- Split the dataset
- Normalise the data
- Train using SVR from Scikit-learn
Here is the code using a pandas dataframe filled with random values
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(20,5), columns=["A","B","C","D", "E"])
a = list(df.columns.values)
a.remove("A")
X = df[a]
y = df["A"]
X_train = X.iloc[0: floor(2 * len(X) /3)]
X_test = X.iloc[floor(2 * len(X) /3):]
y_train = y.iloc[0: floor(2 * len(y) /3)]
y_test = y.iloc[floor(2 * len(y) /3):]
# normalise
from sklearn import preprocessing
X_trainS = preprocessing.scale(X_train)
X_trainN = pd.DataFrame(X_trainS, columns=a)
X_testS = preprocessing.scale(X_test)
X_testN = pd.DataFrame(X_testS, columns=a)
y_trainS = preprocessing.scale(y_train)
y_trainN = pd.DataFrame(y_trainS)
y_testS = preprocessing.scale(y_test)
y_testN = pd.DataFrame(y_testS)
import sklearn
from sklearn.svm import SVR
clf = SVR(kernel='rbf', C=1e3, gamma=0.1)
pred = clf.fit(X_trainN,y_trainN).predict(X_testN)
gives this error:
C:\Anaconda3\lib\site-packages\pandas\core\index.py:542: FutureWarning: slice indexers when using iloc should be integers and not floating point "and not floating point",FutureWarning) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () 34 clf = SVR(kernel='rbf', C=1e3, gamma=0.1) 35 ---> 36 pred = clf.fit(X_trainN,y_trainN).predict(X_testN) 37
C:\Anaconda3\lib\site-packages\sklearn\svm\base.py in fit(self, X, y, sample_weight) 174 175 seed = rnd.randint(np.iinfo('i').max) --> 176 fit(X, y, sample_weight, solver_type, kernel, random_seed=seed) 177 # see comment on the other call to np.iinfo in this file 178
C:\Anaconda3\lib\site-packages\sklearn\svm\base.py in _dense_fit(self, X, y, sample_weight, solver_type, kernel, random_seed) 229 cache_size=self.cache_size, coef0=self.coef0, 230 gamma=self._gamma, epsilon=self.epsilon, --> 231 max_iter=self.max_iter, random_seed=random_seed) 232 233 self._warn_from_fit_status()
C:\Anaconda3\lib\site-packages\sklearn\svm\libsvm.pyd in sklearn.svm.libsvm.fit (sklearn\svm\libsvm.c:1864)()
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
I am not sure why. Can anyone explain? I think it has something to with converting back to dataframes after preprocessing.
y_trainN
, it's producing an incorrect array shape the following works:pred = clf.fit(X_trainN,y_trainN.squeeze().values).predict(X_testN)
, if you look at what is outputted when you doy_trainN.values
you get a nested array when what you want is just an array even though you have just a single column in your df, what you should do is pass a single column:pred = clf.fit(X_trainN,y_trainN[0]).predict(X_testN)
– EdChuma = list(df)
rather thana = list(df.columns.values)
if you want a list of the columns see related post. – EdChum