Trying to do a Kruskall Wallis post hoc test in python but stats are different?

Question

I'm struggling to figure this. I'm new to python coming from an SPSS background. Essentially once you've done a Kruskal Wallis test and it returns a low p-value, the correct procedure is to do a post hoc Dunn test. I've been struggling to figure out the math but I found this article (https://journals.sagepub.com/doi/pdf/10.1177/1536867X1501500117), which I think lays it all out.

Python doesn't seem to have a Dunn test aside from figuring out the P-Value but I want to have a similar output to a pairwise comparison test that you can get in SPSS. This includes the z-stat/test statistic, standard deviation, standard deviation error,p-value and adjusted p-value using Bonferroni.

Right now I'm just working on getting the test statistic right so I can do the rest. My data is multiple groups which I've split into multiple data frames. My data, as an example, looks like this:

df1 | Factor 1 | Factor 2 | | -------- | -------- | | 3.45 | 8.95 | | 5.69 | 2.35 | row_total=31 df2 | Factor 1 | Factor 2 | | -------- | -------- | | 5.45 | 7.95 | | 4.69 | 5.35 | row_total=75 etc,etc

So essentially I'm trying to test df1["Factor1"] and df2["Factor1]. What I currently have is:

 def dunn_test(df1,df2,colname):
    ##Equation is z= yi/oi
    ##Where yi is the mean rankings of the two groups
    ## oi is the standard deviation of yi

    #Data Needed
    x=df1[colname]
    z=df2[colname]

    grouped = pd.concat([x,z])
    N =len(grouped)

    #calculating the Mean Rank of the Two Groups
    rank1= stats.rankdata(x)
    rank2=stats.rankdata(z)
    Wa = rank1.sum()/len(x)
    Wb = rank2.sum()/len(z)

    #yi
    y= Wa-Wb
    
    #standard deviation of yi
    #tied Ranks
    ranks= stats.rankdata(grouped)
    
    tied=pd.DataFrame([Counter(ranks)]).T
    tied= tied.reset_index()
    tied = tied.rename(columns={"index":"ranks",0:'ties'})
    count_ties = tied[tied.ties >=2].count()


    #standard Deviaton formula
    t= tied["ties"]
    for tied in t:
        e = t**3-t
        e = [i for i in e if i != 0]
    
    oi=((N*(N+1)/2) - sum(e)/12*(N-1))*(1/len(x) + 1/len(z))
    
    zstat=y/oi
    
    return zstat

It outputs 0.0630. The issue I'm having is that when I run the same test through SPSS, the number is -51.422. I'm not sure I'm doing it right, have the right equation or what I'm meant to do.

Any help would be appreciated.

Clovis Clovis · Accepted Answer · 2021-07-19T08:15:55

I had to do something similar. The code below should work for you. It performs the Kruskal-Wallis test along with the Dunn's Test. The p values on the Dunn's test use a Bonferroni correction. The data needs to be structured in a single column, with some stratifying indicators included. post_hoc_result_dict returns the variable name, z-score, the p-value, and the corrected p-value in that order. The code below should work for you as is. lmk.

FUNCTION CALL:

f1 = df1['Factor 1'].to_frame(name='value')
f1['factor'] = 'Factor 1'
f2 = df1['Factor 1'].to_frame(name='value')
f2['factor'] = 'Factor 2'
correct_format = pd.concat([f1,f2])
k,p,post_hoc_result_dict = kw_test(correct_format,'factor','value')

FUNCTIONS:

def p_rounder(p_value):
    if p_value < .0001:
        p_value = '<.0001'
    else:
        p_value = str((round(p_value,4)))
    return p_value

def bon_correct(p_value,k):
    corrected_p = p_value * ((k *(k-1))/2)
    return p_value, corrected_p

def kw_dunn_post_hoc(df,strat,comp_list, var):
    post_hoc_result_dict = {}
    N = df['rank'].count()
    n_groups = df[strat].nunique()
    for comp in comp_list:
        m1 = df.loc[df[strat] == comp[0]]['rank'].mean()
        n1 = df.loc[df[strat] == comp[0]]['rank'].count()
        m2 = df.loc[df[strat] == comp[1]]['rank'].mean()
        n2 = df.loc[df[strat] == comp[1]]['rank'].count()
        Z = (m1 - m2)/sqrt(((N*(N+1))/12)*((1/n1)+(1/n2)))
        Z = round(Z,4)
        p = stats.norm.sf(abs(Z))
        p, corrected_p = bon_correct(p,n_groups)
        p = p_rounder(p)
        corrected_p = p_rounder(corrected_p)
        comparison = f'{comp[0]} vs. {comp[1]}'
        post_hoc_result_dict[comparison] = [var,Z,p,corrected_p]
    return post_hoc_result_dict

def kw_test(df,stratifier,var):
    import sys
    from math import sqrt
    result_list = []
    strat_list = []
    comparison_list = []
    counter = 0
    temp_df = df[[stratifier,var]].copy()
    temp_df['rank'] = temp_df[var].rank(method='average')
    for strat in df[stratifier].unique():
        result = df.loc[df[stratifier] == strat][var].values
        result_list.append(result)
        strat_list.append(strat)
    for st in strat_list:
        for st2 in strat_list:
            if st != st2 and [st2,st] not in comparison_list:
                comparison_list.append([st,st2])
    post_hoc_result_dict = kw_dunn_post_hoc(temp_df,stratifier,comparison_list,var)
    if len(result_list) == 2:
        k,p = stats.kruskal(result_list[0],result_list[1])
    if len(result_list) == 3:
        k,p = stats.kruskal(result_list[0],result_list[1],result_list[2])
    elif len(result_list) == 4:
        k,p = stats.kruskal(result_list[0],result_list[1],result_list[2],result_list[3])
    elif len(result_list) == 5:
        k,p = stats.kruskal(result_list[0],result_list[1],result_list[2],result_list[3],result_list[4])
    else:
        print('Stratifying levels greater than 5. Please modify code to accomodate.')
        sys.exit()
    k = round(k,4)    
    p = p_rounder(p)
    return k, p, post_hoc_result_dict

Trying to do a Kruskall Wallis post hoc test in python but stats are different?

1 Answers