I am having trouble conducting Wilcoxon test analyses on heavily tied data. I have outlined my problem as best I can below, how I have tried to address it, and the questions I have. I'd be really grateful for any advice anyone could give me.
My Problem I am working on a dataset where I need to compare three groups on a measure which was used for group assignment. When I run a one-way ANOVA, neither (1) the assumption of normality of residuals, nor (2) the assumption of homogeneity of variance of residuals is met.
I therefore used the Wilcoxon test to conduct pairwise comparisons in r with the following code (example for one comparison, two-sided alternative hypothesis as default):
measure ~ group, data= myreduceddataset, na.rm=TRUE, paired=FALSE, exact=TRUE, conf.int=TRUE
However, the output of my analysis looked strange to me (screenshot of example here), and gave up errors for every comparison (one example copied below):
Warning messages: 1: In wilcox.test.default(x = c(2, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, : cannot compute exact p-value with ties 2: In wilcox.test.default(x = c(2, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, : cannot compute exact confidence intervals with ties
Checking the data I then checked the data and looked at how the data are ranked in R to try to figure out the error. It seems as though, although there are some tied ranks throughout, the main problem is the number of 0 values in Group 1 here is some example raw and ranked data by group
The solution I found, and questions this raised From reading around, it appears that the solution to this could be to use the test from the 'Coin' package in R.
I had a go, and here is an example of my output. However, I am still not entirely clear on whether this is correct, and I have outlined some questions I still have below.
- I am not sure if an asymptotic test or an exact test is more appropriate for this dataset (the output appears to be the same)
- I am assuming I should use the coin::wilcox_test() not the coin::wilcoxsign_test(), as I am comparing samples from independent groups. Is this correct?
- If I am understanding correctly, the 'Z' value is the effect size. How do I derive the W statistic? Or can I just report the effect size?
- I am not sure how to correct this output for multiple comparisons
I'd be more than happy to give more detail if it would be helpful. Many thanks in advance.
UPDATE: Simulated data (same group means and SDs) here:
structure(list(measure = c(9, 15, 6, 7, 8, 7, 12, 5, 14, 9, 7,
13, 8, 14, 11, 16, 9, 7, 3, 8, 3, 21, 4, 3, 11, 13, 5, 7, 8,
15, 5, 15, 3, 9, 5, 2, 8, 6, 1, 1, 7, 6, 9, 5, 6, 2, 6, 10, 6,
6, 8, 6, 9, 8, 6, 2, 6, 2, 9, 5, 6, 4, 10, 7, 9, 8, 6, 4, 6,
14, 1, 12, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 2, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0), group = structure(c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", "2", "3"
), class = "factor")), row.names = c(NA, -122L), class = "data.frame")
data.frame(...)
or the output fromdput(head(x))
) directly. – r2evans