Based on the scikit-learn document http://scikit-learn.org/stable/auto_examples/svm/plot_iris.html#sphx-glr-auto-examples-svm-plot-iris-py. I try to plot a decision boundaries of the classifier, but it sends a error message call "ValueError: X has 2 features per sample; expecting 908430" for this code "Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])"
clf = SGDClassifier().fit(step2, index)
X=step2
y=index
h = .02
colors = "bry"
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)
plt.axis('off')
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
the 'index' is a label which contain around [98579 X 1] label for the comment which include positive, natural and negative
array(['N', 'N', 'P', ..., 'NEU', 'P', 'N'], dtype=object)
the 'step2' is the [98579 X 908430] numpy matrix which formed by the Countvectorizer function, which is about the comment data
<98579x908430 sparse matrix of type '<type 'numpy.float64'>'
with 3168845 stored elements in Compressed Sparse Row format>