I have gone through all the similar questions but none of them answer my query. I am using random forest classifier as follows:
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
clf.fit(X_train, y_train)
clf.predict(X_test)
It's giving me this error:
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
However, when I do X_train.describe() I don't see any missing values. In fact, actually, I already took care of the missing values before even splitting my data.
When I do the following:
np.where(X_train.values >= np.finfo(np.float32).max)
I get:
(array([], dtype=int64), array([], dtype=int64))
And for these commands:
np.any(np.isnan(X_train)) #true
np.all(np.isfinite(X_train)) #false
And after getting the above results, I also tried this:
X_train.fillna(X_train.mean())
but I get the same error and it doesn't fix anything.
Please tell me where I'm going wrong. Thank you!