Your provided code snippet is missing a fig
definition. I prefer using plotly.graph_objs
but the with setup below you can chose to show your figures using fig.show()
or iplot(fig)
. You won't be able to just include an argument and get a best fit line automaticaly, but you sure can get this programmatically. You'll just need to add a couple of lines to you original setup and you're good to go.
Plot:

Complete code with sample data:
import pandas as pd
import datetime
import statsmodels.api as sm
import plotly.graph_objs as go
from plotly.offline import iplot
df=pd.DataFrame({'date': {0: '2015-11-11',
1: '2015-11-12',
2: '2015-11-14',
3: '2015-11-15',
4: '2015-11-21',
5: '2015-11-22',
6: '2015-11-23'},
'score': {0: 1, 1: 2, 2: 4, 3: 2, 4: 3, 5: 2, 6: 3}})
df = df.sort_values(by=['date'], ascending=[True])
df['timestamp']=pd.to_datetime(df['date'])
df['serialtime']=[(d-datetime.datetime(1970,1,1)).days for d in df['timestamp']]
x = sm.add_constant(df['serialtime'])
model = sm.OLS(df['score'], x).fit()
df['bestfit']=model.fittedvalues
fig=go.Figure()
fig.add_trace(go.Scatter(x=df['date'],
y=df['score'],
mode='markers',
name = 'score')
)
fig.add_trace(go.Scatter(x=df['date'],
y=df['bestfit'],
mode='lines',
name='best fit',
line=dict(color='firebrick', width=2)
))
iplot(fig)
Some details:
Time series often present certain issues for linear OLS estimation. The format of the dates themselves can be challenging, so in this case it would be tempting to use the index of your dataframe as an independent variable. But since your dates are not continuous, simply replacing them with a continous series would result in erroneous regression coefficients. I often find it best to use a serialized integer array to represent time series data, meaning that each date is represented by an integer which in turn is the count ouf days from some epoch. In this case 01.01.1970
.
And that's exactly what I'm doing here:
df['timestamp']=df['datetime'] = pd.to_datetime(df['date'])
df['serialtime'] = [(d- datetime.datetime(1970,1,1)).days for d in df['timestamp']]
Here's a plot that illustrates the effects on your OLS estimates by using the wrong data:

cufflinks
extension allows it withdf.iplot(bestfit=True)
. – jmz