sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float32')

Question

I am not familiar with python and am trying to run a decision tree classifier in python using SKLEARN library and when I run the code, I encounters the error:

ValueError: Input contains NaN, infinity or a value too large for dtype('float32')

I have tried using a smaller subset of my excel datasheet and the code is able to execute with the results I want. So I suspect the problem is that my data set is too big. Here is my code that causes the crash:

df_X = data_train[['DayOfWeek', 'Promo', 'StateHoliday']]
df_Y = data_train[['Sales_band']]

X_train, X_test, y_train, y_test = train_test_split(df_X, df_Y, random_state=1)
model = tree.DecisionTreeClassifier()
model.fit(X_train, y_train) // Line that causes crash
y_predict = model.predict(X_test)

print('The accuracy of the Decision Tree is', accuracy_score(y_test, y_predict))

The error message seems to suggest that your dataset is not too big; rather that one of the values of your dataset is either: Not a number, infity or a number too large to fit into a floating point number of type float32. I would suggest checking your data for missing values/nan's as a first step. — Pallie

isaac-moore isaac-moore · Accepted Answer · 2018-11-10T21:51:56

You may have missing values in your dataset. You may want to use dropna() to remove all rows containing missing values if it won't affect the quality of your prediction/accuracy of prediction

sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float32')

1 Answers