I'm struggling to figure this. I'm new to python coming from an SPSS background. Essentially once you've done a Kruskal Wallis test and it returns a low p-value, the correct procedure is to do a post hoc Dunn test. I've been struggling to figure out the math but I found this article (https://journals.sagepub.com/doi/pdf/10.1177/1536867X1501500117), which I think lays it all out.
Python doesn't seem to have a Dunn test aside from figuring out the P-Value but I want to have a similar output to a pairwise comparison test that you can get in SPSS. This includes the z-stat/test statistic, standard deviation, standard deviation error,p-value and adjusted p-value using Bonferroni.
Right now I'm just working on getting the test statistic right so I can do the rest. My data is multiple groups which I've split into multiple data frames. My data, as an example, looks like this:
df1 | Factor 1 | Factor 2 | | -------- | -------- | | 3.45 | 8.95 | | 5.69 | 2.35 | row_total=31 df2 | Factor 1 | Factor 2 | | -------- | -------- | | 5.45 | 7.95 | | 4.69 | 5.35 | row_total=75 etc,etc
So essentially I'm trying to test df1["Factor1"] and df2["Factor1]. What I currently have is:
def dunn_test(df1,df2,colname):
##Equation is z= yi/oi
##Where yi is the mean rankings of the two groups
## oi is the standard deviation of yi
#Data Needed
x=df1[colname]
z=df2[colname]
grouped = pd.concat([x,z])
N =len(grouped)
#calculating the Mean Rank of the Two Groups
rank1= stats.rankdata(x)
rank2=stats.rankdata(z)
Wa = rank1.sum()/len(x)
Wb = rank2.sum()/len(z)
#yi
y= Wa-Wb
#standard deviation of yi
#tied Ranks
ranks= stats.rankdata(grouped)
tied=pd.DataFrame([Counter(ranks)]).T
tied= tied.reset_index()
tied = tied.rename(columns={"index":"ranks",0:'ties'})
count_ties = tied[tied.ties >=2].count()
#standard Deviaton formula
t= tied["ties"]
for tied in t:
e = t**3-t
e = [i for i in e if i != 0]
oi=((N*(N+1)/2) - sum(e)/12*(N-1))*(1/len(x) + 1/len(z))
zstat=y/oi
return zstat
It outputs 0.0630. The issue I'm having is that when I run the same test through SPSS, the number is -51.422. I'm not sure I'm doing it right, have the right equation or what I'm meant to do.
Any help would be appreciated.