1
votes

I have a long dataset (N=499). From which I'm comparing how an index behaves with 8 different treatments (with different number of samples in each treatment).

I already did a Kruskal-Wallis and it was significant (p value < 2.2e-16).

Now, for the post-hoc test, I'm thinking about using a Dunn's test but I've read that Wilcoxon could be useful as well, any suggestions?

Thank you all very much.

1

1 Answers

5
votes

This answer really belongs on Cross Validated, not stackoverflow, but:

The Wilcoxon (aka Mann-Whitney aka Mann-Whitney-Wilcoxon) rank sum test is inappropriate as a post hoc test for pairwise comparisons forllowing a rejection of the Kruskal-Wallis test for two reasons:

  1. The rank sum test does not use the same rank orderings as the Kruskal-Wallis test. The Kruskal-Wallis test ranks across all groups, but the rank sum test will simply rank between the two groups in each comparison. This results in different rankings—effectively different data—being used for each test.

  2. If the null hypothesis of the Kruskal-Wallis test is true, then each group is drawn from a population with the same variance in ranked observations. The best estimate of this variance is that used in calculating the Kruskal-Wallis test statistic (and is akin to the pooled variance in the post hoc t tests following rejection of a oneway ANOVA). The rank sum test does not incorporate pooled variance across all groups in the construction of pairwise tests, but only on the two groups in each test.

Dunn's test preserves the ranks that the Kruskal-Wallis uses, and uses a pooled variance estimate to construct post hoc approximate z test statistics.

The Conover-Iman test likewise preserves the ranks that the Kruskal-Wallis uses, and uses a pooled variance estimate to construct post hoc t test statistics. This test is valid if and only if you reject the Kruskal-Wallis test, but provides uniformly greater power to reject the null than Dunn's test.