Apparently you are computing the chi-squared statistic and p-value for the contingency table (i.e. "cross tab") of the data. The scipy function pearsonr
is not the correct function to use for this. To do the calculation with scipy, you'll need to form the contingency table and then use scipy.stats.chi2_contingency
.
There are several ways you could convert d1
and d2
into a contingency table. Here I'll use the Pandas function pandas.crosstab
. Then I'll use chi2_contingency
for the chi-squared test.
First, here is your data. I have them in numpy arrays, but this is not necessary:
In [49]: d1
Out[49]:
array([1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0,
1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1])
In [50]: d2
Out[50]:
array([1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,
1, 2, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0,
1, 1, 0, 1, 2, 1, 0, 1, 1, 2, 0, 2, 1, 2, 0, 0, 1])
Use pandas to form the contingency table:
In [51]: import pandas as pd
In [52]: table = pd.crosstab(d1, d2)
In [53]: table
Out[53]:
col_0 0 1 2
row_0
0 5 7 4
1 10 34 3
Then use chi2_contingency
for the chi-squared test:
In [54]: from scipy.stats import chi2_contingency
In [55]: chi2, p, dof, expected = chi2_contingency(table.values)
In [56]: p
Out[56]: 0.057230732412525138
The p value matches the value computed by SPSS.
Update: In SciPy 1.7.0 (targeted for mid-2021), you'll be able to create the contingency table with scipy.stats.contingency.crosstab
:
In [33]: from scipy.stats.contingency import crosstab # Will be in SciPy 1.7.0
In [34]: d1
Out[34]:
array([1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1,
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1,
0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1])
In [35]: d2
Out[35]:
array([1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1,
1, 1, 2, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1,
1, 0, 1, 1, 0, 1, 2, 1, 0, 1, 1, 2, 0, 2, 1, 2, 0, 0, 1])
In [36]: (vals1, vals2), table = crosstab(d1, d2)
In [37]: vals1
Out[37]: array([0, 1])
In [38]: vals2
Out[38]: array([0, 1, 2])
In [39]: table
Out[39]:
array([[ 5, 7, 4],
[10, 34, 3]])