0
votes

I am using Support Vector Machines (SVM) with the 'linear' kernel for multi-classification. However, the accuracy is very low. Is it possible to increase the accuracy?

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix
from sklearn.svm import SVC

#Prepare data for SVM
Diabetes_SVM = Diabetes2[['metformin','repaglinide','nateglinide','chlorpropamide','glimepiride','acetohexamide', 'glipizide', 'glyburide','troglitazone', 'tolazamide', 'examide','citoglipton', 'insulin']]

#Create dummy variables
nominal = ['metformin','repaglinide','nateglinide','chlorpropamide','glimepiride','acetohexamide', 'glipizide', 'glyburide', 
           'tolbutamide', 'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol', 'troglitazone', 'tolazamide', 'examide',
           'citoglipton']
Diabetes_SVM = pd.get_dummies(Diabetes_SVM,columns=nominal)

#Map data for SVM
Diabetes_SVM['insulin']=Diabetes_SVM['insulin'].map({'Down': 1,'No': 2,
                                                     'Steady': 3,'Up': 4})

#Defining features and target variable for SVM
X_SVM = Diabetes_SVM.drop('insulin', axis=1).values
y_SVM = Diabetes_SVM['insulin'].values

#Split dataset into training set and test set for SVM
X_train, X_test, y_train, y_test = train_test_split(X_SVM, y_SVM, test_size=0.30, random_state=42)

#Fit SVC Class
svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)

#Making Predictions
y_pred = svclassifier.predict(X_test)
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

I have already tried SVM with a linear classifier with only 0.47 accuracy. How can I adjust accuracy?

1
how big is the training set? is it inbalanced? - Epimetheus
svclassifier = SVC(kernel='linear') . you are using a linear kernel. - Epimetheus
the training set is around 68637 records. I trying to check it. - Nith

1 Answers

0
votes

try SVC(kernel='poly') and normalize your data . Compare your results against LogisticRegression() classifier. Use the best classifier for your data. Test your data to see if it is non-linear. Use pytorch or keras or GLM if the data is nonlinear.

 from sklearn.preprocessing import StandardScaler

 X=df[NUMERIC]
 y=df['Target']

 X_train,X_test,y_train, y_test=train_test_split(X,y,test_size=0.1,random_state=42)

 scaler = MinMaxScaler()
 X_train[X_train.columns] = scaler.fit_transform(X_train[X_train.columns])
 X_test[X_test.columns] = scaler.transform(X_test[X_test.columns])

 model=SVC(kernel='poly', degree=3,C=1E10)
 model.fit(X_train,y_train)
 y_pred=model.predict(X_test)
 print("Accuracy:",metrics.accuracy_score(y_test, y_pred))