Significant differences between multiple variables in R

Question

I have a dataset of particle concentrations recorded at 5 different height. I want to find out whether the differences are significant. For each height, N=15.

What test would be appropriate to use?

I used pairwise.t.test, but am not sure if this is the right solution, as sampling size is quiet small. I also tried pairwise.wilcox.test which returns different p-values and errors "cannot compute exact p-value with ties". Is this due to the small sampling size and can I use it?

mydata:

structure(list(height = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 
3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 
4L, 5L), values = c(1.67, 3.33, 6.67, 10, 15, 25, 20, 11.67, 
16.67, 18.33, 1.67, 0, 1.67, 5, 3.33, 5, 73.33, 8.33, 5, 5, 10, 
5, 6.67, 6.67, 3.33, 18.33, 18.33, 6.67, 38.33, 0, 23.33, 10, 
15, 11.67, 5, 11.67, 8.33, 1.67, 15, 3.33, 13.33, 10, 10, 3.33, 
10, 8.33, 21.67, 10, 41.67, 8.33, 3.33, 36.67, 15, 11.67, 8.33, 
8.33, 8.33, 5, 5, 0, 1.67, 8.33, 16.67, 3.33, 10, 16.67, 8.33, 
8.33, 25, 1.67, 6.67, 26.67, 3.33, 11.67, 1.67)), row.names = c(NA, 
-75L), class = "data.frame")

marvinschmitt marvinschmitt · Accepted Answer · 2021-01-12T09:16:45

If you only want to know if any group means differ significantly, you might want to use an analysis of variance (ANOVA).

library(afex)
df$id = 1:nrow(df)
aov_ez(data=df, id="id", between="height", dv="values")

results in

Anova Table (Type 3 tests)

Response: values
  Effect    df    MSE      F  ges p.value
1 height 4, 70 118.38 2.45 + .123    .054
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1

So the result is slightly non-significant at an alpha-level of 5%. The effect size, however, is large at a generalized eta-squared (ges) of 0.123.

The problem with pairwise tests (such as the t-test you mentioned) is that the alpha-error cumulates. In order to account for that alpha error inflation, you would need to reduce the alpha level of the individual tests, resulting in a drastically reduced power.

If the data comes from dependent measures (aka within data), i.e. you measured the same subject multiple times at these heights, you might use a within-subject analysis.

Addition: For a quick visualization, you might want to try

boxplot(df$values~df$height)

Significant differences between multiple variables in R

3 Answers