1
votes

When running my Tukey test, it gives me this error:

Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'

My Dataframe Head Output:

    Group    Score
3   A        1.91
4   B        1.7
5   C        1.69
6   D        1.68
7   E        1.49

My Tukey Test Code:

from statsmodels.stats.multicomp import pairwise_tukeyhsd
from statsmodels.stats.multicomp import MultiComparison

mc = MultiComparison(df['Score'], df['Group'])
result = mc.tukeyhsd()

print(result)
print(mc.groupsunique)


> TypeError Traceback (most recent call
> last) <ipython-input-10-705a07612b72> in <module>()
>       1 mc = MultiComparison(df['Score'], df['Group'])
> ----> 2 result = mc.tukeyhsd()
>       3 
>       4 print(result)
>       5 print(mc.groupsunique)
> 
> /usr/local/lib/python3.6/dist-packages/statsmodels/sandbox/stats/multicomp.py
> in tukeyhsd(self, alpha)
>     964         self.groupstats = GroupsStats(
>     965                             np.column_stack([self.data, self.groupintlab]),
> --> 966                             useranks=False)
>     967 
>     968         gmeans = self.groupstats.groupmean
> 
> /usr/local/lib/python3.6/dist-packages/statsmodels/sandbox/stats/multicomp.py
> in __init__(self, x, useranks, uni, intlab)
>     535 
>     536         #temporary until separated and made all lazy
> --> 537         self.runbasic(useranks=useranks)
>     538 
>     539 
> 
> /usr/local/lib/python3.6/dist-packages/statsmodels/sandbox/stats/multicomp.py
> in runbasic(self, useranks)
>     569         else:
>     570             self.xx = x[:,0]
> --> 571         self.groupsum = groupranksum = np.bincount(self.intlab, weights=self.xx)
>     572         #print('groupranksum', groupranksum, groupranksum.shape, self.groupnobs.shape
>     573         # start at 1 for stats.rankdata :
> 
> TypeError: Cannot cast array data from dtype('O') to dtype('float64')
> according to the rule 'safe'

Does anyone know what this means?

1
Do you have empty values for Score? - roganjosh
Btw, I edited Tukey test out of the title in the hope it would give you more footfall on the problem. This issue is not restricted to just that one problem. - roganjosh
@roganjosh There are no empty values. What else could it be? I tried for so long to solve this but nothing works. - ee8291
python tracebacks are informative. You need to show at least the last few lines to see where the exception occurs. Either your score or groups column is an object array that the current tukeyhsd code cannot handle. - Josef
@Josef. Yes, I am sorry for not adding that info. I just edited the main post and it should be visible now. Can you reassess? Also, all my df types are object. I am not sure if this may be part of the problem? - ee8291

1 Answers

1
votes

Try replacing the line

mc = MultiComparison(df['Score'], df['Group'])

with

mc = MultiComparison(df['Score'].astype('float'), df['Group'])

If you obtain a failure there, then there is likely a problematic row. You can resolve this by using the following instead:

mc = MultiComparison(pd.to_numeric(df['Score'], errors='coerce'), df['Group'])