1
votes

My machine learning model dataset is cleaveland data base with 300 rows and 14 attributes--predicting whether a person has heart disease or not.. But aim is create a classification model on logistic regression... I preprocessed the data and ran the model with x_train,Y_train,X_test,Y_test.. and received avg of 82 % accuracy...

So to improve the accuracy I did remove features that are highly correlated to each other [as they would give the same inforamtion]

And I did RFE[recursive feature elimination]

followed by PCA[principle component analysis] for dimensionality reduction...

Still I didnt find the dataset to be be better in accuracy..

Why is that?

Also why does my model shows different accuracy each time? is it beacuse of taking different x_train,Y_train,X_test,Y_test each time?

Should i change my model for better accuracy? Is 80 % average good or bad accuracy?

3

3 Answers

3
votes

Try Exhausting grid search or Randomized parameter optimization to tune your hyper parameters.

See: Documentation for hyperparameter tuning with sklearn

1
votes

Should i change my model for better accuracy?

At least you could try to. The selection of the right model is highly dependend on the concrete use case. Trying out other approaches is never a bad idea :)

Another idea would be to get the two features with the highest variance via PCA. Then you could plot this in 2D space to get a better feeling if your data is linearily separable.

Also why does my model shows different accuracy each time?

I am assuming you are using the train_test_split method of scikit-learn so split your data? By default, this method shuffels your data randomized. Your could set the random_state parameter to a fixed value to obtain reproducable results.

1
votes

see (https://github.com/dnishimoto/python-deep-learning/blob/master/Credit%20Card%20Defaults%20-%20hyperparameter.ipynb) to improve accuracy you do hypertuning and dimension reduction and scaling. hypertuning is finding best parameters. whereas dimension reduction is removing features that don't contribute to accuracy reducing noise. scaling or normalizing reduce noise in the distribution.

look at GridSearch for find best parameters