I have a dataset with 66854 samples. It has 4 columns which are "Height", "Weight", "Belly", "Hip". Belly and Hip are mapped to 0,1,2 (respectively narrow, medium, large). I am trying to predict Jean size with these information.
Here how many data for each class:
df2["Jean"].value_counts()
28 11780
27 10166
26 9259
29 7260
30 6905
32 5688
25 5196
24 3932
31 3603
33 3065
Name: Jean, dtype: int64
After splitting 0.8 train, 0.2 test with train_test_split() from sklearn and training a Logistic Regression Model with default parameters i am getting this classification report:
precision recall f1-score support
24 0.39 0.40 0.39 1966
25 0.00 0.00 0.00 2598
26 0.27 0.45 0.34 4630
27 0.25 0.14 0.18 5083
28 0.28 0.59 0.38 5890
29 0.00 0.00 0.00 3630
30 0.26 0.28 0.27 3453
31 0.00 0.00 0.00 1801
32 0.31 0.40 0.35 2844
33 0.58 0.36 0.44 1532
accuracy 0.29 33427
macro avg 0.23 0.26 0.23 33427
weighted avg 0.23 0.29 0.24 33427
As you can see above, classes 25, 29 and 31 are all precision-recall zero and when i try to use this model i never get those classes predicted. Any reason for that? Any fixes?