0
votes

I face no problems in importing the dataset. However, when I run SMOTE or RandomUnderSampler I get the error (however, only in this dataset I face this problem not with others). I cannot even make out which column/field the error is pointing at. Is there any change or tweaking I need to do in the parameters for SMOTE/RUS?

#Smote
sm = SMOTE(random_state = 2, k_neighbors= 8) 
X_train_sm, y_train_sm = sm.fit_resample(X_train.values, y_train)

X_train_rm = X_train_sm
y_train_rm = y_train_sm

#RandomUnderSampler
rus = RandomUnderSampler(
sampling_strategy = 'auto',
random_state= 0,
replacement = True)
X_train_rus, y_train_rus = rus.fit_resample(X_train.values, y_train)

Error

Traceback (most recent call last)
<ipython-input-28-b68eb88feabe> in <module>
  5 random_state= 0,
  6 replacement = True)
 ----> 7 X_train_rus, y_train_rus = rus.fit_resample(X_train.values, y_train)
  8 
  9 X_train_rm = X_train_rus

 ~\Anaconda3\lib\site-packages\imblearn\base.py in fit_resample(self, X, y)
 73             The corresponding label of `X_resampled`.
 74         """
 ---> 75         check_classification_targets(y)
 76         arrays_transformer = ArraysTransformer(X, y)
 77         X, y, binarize_y = self._check_X_y(X, y)

 ~\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py in check_classification_targets(y)
167     y : array-like
168     """
--> 169     y_type = type_of_target(y)
170     if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
171                       'multilabel-indicator', 'multilabel-sequences']:

~\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py in type_of_target(y)
288         return 'continuous' + suffix
289 
--> 290     if (len(np.unique(y)) > 2) or (y.ndim >= 2 and len(y[0]) > 1):
291         return 'multiclass' + suffix  # [1, 2, 3] or [[1., 2., 3]] or [[1, 2]]
292     else:

<__array_function__ internals> in unique(*args, **kwargs)

 ~\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py in unique(ar, return_index, return_inverse, 
 return_counts, axis)
 261     ar = np.asanyarray(ar)
 262     if axis is None:
 --> 263         ret = _unique1d(ar, return_index, return_inverse, return_counts)
 264         return _unpack_tuple(ret)
 265 

 ~\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py in _unique1d(ar, return_index, 
  return_inverse, return_counts)
  309         aux = ar[perm]
  310     else:
  --> 311         ar.sort()
  312         aux = ar
  313     mask = np.empty(aux.shape, dtype=np.bool_)
  TypeError: '<' not supported between instances of 'float' and 'str'
You are asking us to imagine the code you are running and tell you where the problem with it might lie. Knowing that it sometimes works is not enough to go on. Clearly, if it sometimes works then the way your code is using it is at fault. The error message suggests you are feeding it data that has a string where the code is expecting a float. With neither the code nor the data it's impossible to be more specific.BoarGules
I have added the smote and the random under sampler code, if that helps. The data is confidential so I cannot share it.seraphis
Well if the data is confidential, construct a small quantity of fictitious data that you can share so that we can reproduce the problem. The error message is complaining about the type of some data it is dealing with. Without some data that embodies the problem we can't solve it.BoarGules