3
votes

I am trying to plot SVM decision boundary which separates two classes, cancerous and non-cancerous. However, it's displaying a plot which is far from what I wanted. I wanted it to look like this:

enter image description here or anything that shows the points are scattered. Here's my code:

import numpy as np
import pandas as pd
from sklearn import svm
from mlxtend.plotting import plot_decision_regions
import matplotlib.pyplot as plt

autism = pd.read_csv('predictions.csv')


# Fit Support Vector Machine Classifier
X = autism[['TARGET','Predictions']]
y = autism['Predictions']

clf = svm.SVC(C=1.0, kernel='rbf', gamma=0.8)
clf.fit(X.values, y.values) 

# Plot Decision Region using mlxtend's awesome plotting function
plot_decision_regions(X=X.values, 
                      y=y.values,
                      clf=clf, 
                      legend=2)

# Update plot object with X/Y axis labels and Figure Title
plt.xlabel(X.columns[0], size=14)
plt.ylabel(X.columns[1], size=14)
plt.title('SVM Decision Region Boundary', size=16)
plt.show()

But I got a weird looking plot:

enter image description here

You can find the csv file here predictions.csv

1
Please have a look at the answer; if you have made a mistake in the input file, please do not update the question to change it - just accept the answer (see What should I do when someone answers my question?) and open a new question, should you have any further issues...desertnaut
@desertnaut no i did not make a mistake in the input file. i purposely wanted to try that file (predictions.csv) because I couldn't work out with the one in my repo. But i will have a try if you could just answer my question in the comment?Falady
As said, what you are trying to do here dies not make any sense at all. And I have already answered your commentdesertnaut

1 Answers

2
votes

You sound a little confused...

Your predictions.csv looks like:

TARGET  Predictions
     1  0
     0  0
     0  0
     0  0

and, as I guess the column names imply, it contains the ground truth (TARGET) and the Predictions of some (?) model already run.

Given that, what you are doing in your posted code makes absolutely no sense at all: you are using both these columns as features in your X in order to predict your y, which is... exactly one of these same columns (Predictions), already contained in your X...

Your plot looks "strange" simply because what you have plotted are not your data points, and the X and y data you show here are not the data that should be used for fitting your classifier.

I am further puzzled because, in your linked repo, you have indeed the correct procedure in your script:

autism = pd.read_csv('10-features-uns.csv')

x = autism.drop(['TARGET'], axis = 1)  
y = autism['TARGET']
x_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.30, random_state=1)

i.e. reading your features and labels from 10-features-uns.csv, and certainly not from predictions.csv, as you are inexplicably trying to do here...