Unbalanced dataset resulting in high false positives after using SMOTE

Question

I am working on a binary classification imbalanced marketing dataset which has:

No:Yes ratio of 88:12 (No-didn't buy the product, yes-bought)
~4300 observations and 30 features (9 numeric and 21 categorical)

I divided my data into train (80%) & test (20%) sets and then used standard_scalar & SMOTE on train set. SMOTE made 'No:Yes' ratio of train dataset to 1:1. I then ran a logistic regression classifier as shown in code below and got a recall score of 80% on test data as opposed to only 21% on test data by applying logistic regression classifier without SMOTE.

With SMOTE the recall increase is great, however the false positives are quite high (Refer image for confusion matrix)which is a problem because we will end up targeting many false (unlikely to buy) customers. Is there a way to bring down false positives without sacrificing on recall/true positives?

#Without SMOTE
clf_logistic_nosmote = LogisticRegression(random_state=0, solver='lbfgs').fit(X_train,y_train)

#With SMOTE  (resampled train datasets) 
clf_logistic = LogisticRegression(random_state=0, solver='lbfgs').fit(X_train_sc_resampled, y_train_resampled)

whats the the value_counts of each class at the beginning?? How many features are you using in total?after feature engineering if that takes place? — Herc01
total samples= 4334, value_counts: 0 = 3832 (88%), 1 = 502 (12%) total features after feature engineering = 30 (9 numeric, 21 categorical) — Vikrant Arora

Aditya Bhattacharya Aditya Bhattacharya · Accepted Answer · 2019-11-21T18:02:37

Even I had a similar issue, where the false positives where very high. In that case I had applied SMOTE after doing feature engineering.

Then I had used SMOTE before doing feature engineering and used SMOTE generated data to extract the features. That way it worked pretty well. Although, this will be a slower approach, but it worked out for me. Let me know how it goes for you.

Unbalanced dataset resulting in high false positives after using SMOTE

1 Answers