Sklearn ValueError: X has 2 features per sample; expecting 11

Question

I try to visualizing multiple logistic regression but I get the above error.

I'm practicing on red wine quality data set from kaggle.

Here is a full traceback:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-88-230199fd3a97> in <module>
      4 X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
      5                      np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
----> 6 plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
      7              alpha = 0.75, cmap = ListedColormap(('red', 'green')))
      8 plt.xlim(X1.min(), X1.max())

/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/base.py in predict(self, X)
    287             Predicted class label per sample.
    288         """
--> 289         scores = self.decision_function(X)
    290         if len(scores.shape) == 1:
    291             indices = (scores > 0).astype(np.int)

/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/base.py in decision_function(self, X)
    268         if X.shape[1] != n_features:
    269             raise ValueError("X has %d features per sample; expecting %d"
--> 270                              % (X.shape[1], n_features))
    271 
    272         scores = safe_sparse_dot(X, self.coef_.T,

ValueError: X has 2 features per sample; expecting 11

Below is the visualization code:

# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Logistic Regression (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

It looks like you trained your classifier on 11 features, then you changed the shape of the data into a meshgrid, then flattened the two meshgrids and tried to pass those into the classifier. Perhaps you should do your prediction before you change the shape of your data, then reshape your predicted values to match your plotting? — G. Anderson

seralouk seralouk · Accepted Answer · 2019-08-06T17:37:32

You can to add the full code to be sure about the problem but it seems that the model was trained using 11 features but now you are trying to predict using 2 features.

classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape))

Here, the shape of the np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape) should be exactly the same across the column dimension (axis = 1) with the original array used for the training (.fit) of the classifier.

Sklearn ValueError: X has 2 features per sample; expecting 11

1 Answers