0
votes

I am trying to build a Machine Learning model that would predict the delay (the difference between the clear_date and the due_in_date) from the given dataset.

I've split the dataset into x_train, y_train, x_test, validation_set. I'm using Linear Regression model from sklearn library. When I try to fit my data into a Linear Regression model I get a weird error

could not convert string to float: 'CC6000'

How can I resolve this?

Here are the pictures of x_train and y_train [1]: https://i.stack.imgur.com/8RP2J.png [2]: https://i.stack.imgur.com/jB7qN.png [3]: https://i.stack.imgur.com/bDRQH.png

1
Are your date columns of dtype datetime? - Joe Ferndz

1 Answers

0
votes

It seems that you have a string hidden in your dataframe: "CC6000".

Linear Regression only works with numerical samples, so he can't handle with this string.

I have looked your data, and I haven't seen this string, but for sure it has to be there. When you find him, you would have to eliminate this sample if it's the unique string or even, if all the feature is categorical, you would have to encode it or remove.

To look for this string try something like:

df.isin(['CC6000']).any()