3
votes

When I try to linregress the Date and Close field from a dataframe I keep getting the error.

Traceback (most recent call last): File "", line 1, in File "C:\Python34\lib\site-packages\scipy\stats_stats_mstats_common.py", >line 75, in linregress xmean = np.mean(x, None) File "C:\Python34\lib\site-packages\numpy\core\fromnumeric.py", line 2942, in >mean out=out, **kwargs) File "C:\Python34\lib\site-packages\numpy\core_methods.py", line 65, in >_mean ret = umr_sum(arr, axis, dtype, out, keepdims) TypeError: ufunc add cannot use operands with types dtype('dtype('

The code I am using.

import openpyxl,os
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import datetime
import pandas_datareader as pdr
from scipy.stats import linregress

start = datetime.datetime(2010,1,1)
end = datetime.datetime(2017,11,10)
i = NSE/CHENNPETRO
df = pdr.DataReader(i, 'quandl', start, end)
# df = df.iloc[::-1]
linregress(df.index,df['Close'])

If someone could help me out or point me where to look I would be most greatful.

TIA

1

1 Answers

5
votes

Scipy can only do linear regression over numerical values; it does not know how to handle dates. The best way to proceed is probably to convert your dates to numbers (e.g. number of days, number of seconds, etc. as appropriate). Here is an example:

import pandas as pd
import numpy as np

data = pd.DataFrame({'x': np.arange(1000) + np.random.randn(1000)},
                    index=pd.date_range('2012', periods=1000, freq='D'))
data.head()
#  x
# 2012-01-01 -0.475795
# 2012-01-02 -0.222100
# 2012-01-03  2.494785
# 2012-01-04  3.237799
# 2012-01-05  4.412078

Now we can use pandas time deltas to find the number of days since the first index:

# compute days since the first date in the index
delta = (data.index - data.index[0])
days = delta.days

Depending on your data, it may make more sense to instead use hours, minutes, seconds, etc.

With this as input, linear regression will work:

from scipy.stats import linregress
linregress(days, data.x)
# LinregressResult(slope=0.99979977545856191, intercept=0.085015417311694819, rvalue=0.99999344600423345, pvalue=0.0, stderr=0.00011458241597036779)