1
votes

Can someone please explain why I'm getting this error when I'm doing a lingress (slope) on 'day' and 'value' which are both numeric datatype. Below is my script:

import pandas as pd
from scipy.stats import linregress
y = pd.DataFrame({'entity':['a','a','b','b','b','c'],
                          'day':[1999,2004,2003,2007,2014, 2016],
                          'value':[2,5,3,2,7,8]})
mylist= ['a', 'b'] 
y1 = y.groupby('entity').apply(lambda x: x[x['entity'].isin(mylist)])

This line gives error:

y1.apply(lambda v: linregress(v['day'], v['value']))

error trace:

TypeError Traceback (most recent call last) /anaconda3/lib/python3.6/site-packages/pandas/core/indexes/multi.py in get_value(self, series, key) 999 try: -> 1000 return libindex.get_value_at(s, k) 1001 except IndexError:

pandas/_libs/index.pyx in pandas._libs.index.get_value_at()

pandas/_libs/src/util.pxd in util.get_value_at()

TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last) in () ----> 1 y1.apply(lambda v: linregress(v['day'], v['value'])) 2

KeyError: ('day', 'occurred at index entity')

2

2 Answers

0
votes

Note the documentation for linregress calls for array-like type inputs.

An array-like type here means any data structure that can be coerced into a NumPy array. This includes Pandas Series (we can show this by calling np.array on one).

So you can call linregress directly on the dataframe columns, no need for apply here. That is, you can replace the last line of your code with

df = y[y['entity'].isin(mylist)]
linregress(df['day'], df['value'])

The adavantage of Pandas and many of the libraries in the Python data ecosystem is that they play nicely together, and that they are array-oriented, meaning they are optimized for operations over arrays and other large iterable data structures versus scalar values, so for the most part their methods take those data structures by default without requiring you to explicitly call apply, map, and other functions.

Another note: the type error you're seeing is a stack trace internal to Pandas. The most recent trace and the direct cause for your issue as at the bottom.

0
votes

Update, then you need this:

y1.groupby(level='entity').apply(lambda x: linregress(x['day'],x['value']))

Output:

entity
a                                                                 (0.6, -1197.3999999999999, 1.0, 0.0, 0.0)
b    (0.4032258064516129, -805.6774193548387, 0.8485552916276634, 0.35494576760559776, 0.25142673013096595)
dtype: object

I think all you need is to do this, you do not need to apply, just pass the two dataframe columns to linregress:

linregress(y1['day'],y1['value'])

Output:

LinregressResult(slope=0.29073482428115016, intercept=-579.2396166134187, rvalue=0.7502746874224853, pvalue=0.14406233411953523, stderr=0.1479110164470003)