0
votes

I am performing a linear regression analysis on bike share data. I am interested in predicting the bikecount based on the other factors.

So I split the data like so :

x = df[['rain', 'temp', 'rhum', 'msl', 'wdsp', 'day', 'month', 'monthname', 'season']]

y = df['bikecount']

Then, when I get to this stage: lm.fit(X_train,y_train)

it returns this error: ValueError: could not convert string to float: '07/06/2019'

I tried converting this column to float using df['date'] = float(df['date']) but that returns the error TypeError: cannot convert the series to <class 'float'>

I don't understand why this keeps coming up. I'm not even interested in the date column for my analysis. Any help would be appreciated!

0 datetime 6040 non-null datetime64[ns] 1 bikecount 6040 non-null int64
2 rain 6040 non-null float64
3 temp 6040 non-null float64
4 rhum 6040 non-null int64
5 msl 6040 non-null float64
6 wdsp 6040 non-null int64
7 date 6040 non-null object
8 time 6040 non-null object
9 day 6040 non-null object
10 month 6040 non-null int64
11 monthname 6040 non-null object
12 season 6040 non-null object
dtypes: datetime64ns, float64(3), int64(4), object(5) memory usage: 613.6+ KB

datetime bikecount rain temp rhum msl wdsp date datetime.1 day month monthname season
2019-01-01 00:00:00 1 0.0 9.9 78 1036.0 4 01/01/2019 00:00:00 Tuesday 1 January Winter
2019-01-01 07:00:00 1 0.0 8.3 87 1036.8 2 01/01/2019 07:00:00 Tuesday 1 January Winter
2019-01-01 11:00:00 2 0.0 9.5 89 1038.8 3 01/01/2019 11:00:00 Tuesday 1 January Winter
2019-01-01 12:00:00 4 0.0 10.1 84 1038.7 3 01/01/2019 12:00:00 Tuesday 1 January Winter
1
For this date 01/01/2019, what is the expected result?Corralien

1 Answers

0
votes

You can convert your date to the elapsed seconds since 1970-01-01

# Sample
df = pd.DataFrame({'date': ['01/11/2019']}

df['date'] = pd.to_datetime(df['date'], dayfirst=True) \
               .sub(pd.to_datetime(0)) \
               .dt.total_seconds()

Output:

# Before conversion:
>>> df
         date
0  01/11/2019

# After conversion:
>>> df
           date
0  1.572566e+09