This is the expected behavior. Spearman Correlation is a rank correlation, meaning it is performed on the rankings of your data, not the data itself. In your example, the data itself may only vary in one location, but the differences in the data produces different rankings. As suggested in the comments, Spearman correlation probably isn't what you actually want to use.
To expand further, underneath the hood pandas is essentially calling scipy.stats.spearmanr
to compute the correlation. Looking at the source code for spearmanr
, it essentially ends up using scipy.stats.rankdata
to perform the ranking, then np.corrcoef
to get the correlation:
corr1 = np.corrcoef(ss.rankdata(a), ss.rankdata(b))[1,0]
corr2 = np.corrcoef(ss.rankdata(c), ss.rankdata(d))[1,0]
Which produces the same values you're observing. Now, look at the rankings used in each correlation calculation:
ss.rankdata(a)
[ 1. 3. 4. 5. 2.]
ss.rankdata(b)
[ 1. 2. 3. 5. 4.]
ss.rankdata(c)
[ 1. 2. 3. 5. 4.]
ss.rankdata(d)
[ 1. 2. 3. 4. 5.]
Notice that the rankings for a
and b
differ in three locations, compared to the rankings for c
and d
differing in two locations, so we'd expect the resulting correlations to be different.