0
votes

I have asked about this already on stats.exchange (original question), now I re-posted the same content here - hoping to get help from a wider population.


I would like to know the way to exclude all the unwanted pairs from the output generated from two-way ANOVA, so when there shows a significant result from summary(aov()), the post-hoc test won't give me any comparisons I don't want. Details as follows:

I have datTable contain proportion data under two factor site (four levels: A, B, C, D) and treatment(two levels: control and treated). Specifically, I want to do a pair-wise test among all the site under each same treatment (e.g. control-A VS. control-B, control-A VS.control-C, treated-A VS.treated-C, etc.), while excludes comparisons between different sites and different treatments(e.g., pairs such as control-A VS. treated-B, control-B VS. treated-C).

The data looks like this:

> datTable
   site treatment proportion
     A   control  0.5000000
     A   control  0.4444444
     A   treated  0.1000000
     A   treated  0.4000000
     B   control  0.4444444
     B   control  0.4782609
     B   treated  0.0500000
     B   treated  0.3000000
     C   control  0.3214286
     C   control  0.4705882
     C   treated  0.1200000
     C   treated  0.4000000
     D   control  0.3928571
     D   control  0.4782609
     D   treated  0.4000000
     D   treated  0.4100000

I did a two-way ANOVA (also not sure whether to use within subject site/treatment or between subject site*treatment...), and summarised the results.

  m1 <- aov(proportion~site*treatment,data=datTable) # Or should I use 'site/treatment'?

Then my summary(m1) gave me the following:

> summary(m1)
               Df  Sum Sq Mean Sq F value Pr(>F)  
site            3 0.02548 0.00849   0.513 0.6845  
treatment       1 0.11395 0.11395   6.886 0.0305 *
site:treatment  3 0.03686 0.01229   0.742 0.5561  
Residuals       8 0.13239 0.01655                 

Next step is to use TukeyHSD post-hoc test to see actually which pair caused the * significance in site factor.

> TukeyHSD(m1)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = proportion ~ site * treatment, data = datTable)

$site
            diff        lwr       upr     p adj
B-A -0.042934783 -0.3342280 0.2483585 0.9631797
C-A -0.033106909 -0.3244002 0.2581863 0.9823452
D-A  0.059168392 -0.2321249 0.3504616 0.9124774
C-B  0.009827873 -0.2814654 0.3011211 0.9995090
D-B  0.102103175 -0.1891901 0.3933964 0.6869754
D-C  0.092275301 -0.1990179 0.3835685 0.7461309

$treatment
                      diff        lwr         upr     p adj
treated-control -0.1687856 -0.3171079 -0.02046328 0.0304535

$`site:treatment`
                            diff        lwr       upr     p adj
B:control-A:control -0.010869565 -0.5199109 0.4981718 1.0000000
C:control-A:control -0.076213819 -0.5852551 0.4328275 0.9979611
D:control-A:control -0.036663216 -0.5457045 0.4723781 0.9999828
A:treated-A:control -0.222222222 -0.7312635 0.2868191 0.6749021
B:treated-A:control -0.297222222 -0.8062635 0.2118191 0.3863364  # Not wanted
C:treated-A:control -0.212222222 -0.7212635 0.2968191 0.7154690  # Not wanted
D:treated-A:control -0.067222222 -0.5762635 0.4418191 0.9990671  # Not wanted
C:control-B:control -0.065344254 -0.5743856 0.4436971 0.9992203
D:control-B:control -0.025793651 -0.5348350 0.4832477 0.9999985
A:treated-B:control -0.211352657 -0.7203940 0.2976887 0.7189552  # Not wanted
B:treated-B:control -0.286352657 -0.7953940 0.2226887 0.4233804  # Not wanted
C:treated-B:control -0.201352657 -0.7103940 0.3076887 0.7583437  # Not wanted
D:treated-B:control -0.056352657 -0.5653940 0.4526887 0.9996991
D:control-C:control  0.039550603 -0.4694907 0.5485919 0.9999713
A:treated-C:control -0.146008403 -0.6550497 0.3630329 0.9304819  # Not wanted
B:treated-C:control -0.221008403 -0.7300497 0.2880329 0.6798628  # Not wanted
C:treated-C:control -0.136008403 -0.6450497 0.3730329 0.9499131 
D:treated-C:control  0.008991597 -0.5000497 0.5180329 1.0000000  # Not wanted
A:treated-D:control -0.185559006 -0.6946003 0.3234823 0.8168230  # Not wanted
B:treated-D:control -0.260559006 -0.7696003 0.2484823 0.5194129  # Not wanted
C:treated-D:control -0.175559006 -0.6846003 0.3334823 0.8505865  # Not wanted
D:treated-D:control -0.030559006 -0.5396003 0.4784823 0.9999950  
B:treated-A:treated -0.075000000 -0.5840413 0.4340413 0.9981528
C:treated-A:treated  0.010000000 -0.4990413 0.5190413 1.0000000
D:treated-A:treated  0.155000000 -0.3540413 0.6640413 0.9096378
C:treated-B:treated  0.085000000 -0.4240413 0.5940413 0.9960560
D:treated-B:treated  0.230000000 -0.2790413 0.7390413 0.6429921
D:treated-C:treated  0.145000000 -0.3640413 0.6540413 0.9326207

However, there are some pairs I don't want to be included in the two-way ANOVA which I preformed, specified as # not wanted.

Is there any way that I can tweak the aov or TukeyHSD function to exclude those possibilities ('not wanted' ones) I listed above? I could easily select the significant entires that I am interested (with *) from the long list produced from TukeyHSD. But I don't want my result from anova to be biased by those! (It happens in the real data that the significance actually caused by those unwanted pairs!)

NB: You might have noticed that the site:treatment post-hoc tests doesn't show any significance, this is because I only selected a small sample from the original data.

1
You don't exclude anything from an ANOVA. You can restrict your post-hoc test to only relevant comparisons. E.g., you could do pairwise t-tests and use p.adjust for manual adjustment for alpha error inflation.Roland

1 Answers

0
votes

If you mean to exclude those comparisons completely from the calculations, Tukey's test works by doing pairwise comparisons for all combinations of conditions. It doesn't make sense to "exclude" any pairs.

If you mean you want to exclude the unwanted comparisons from showing in your final results then yes, it is possible. The result of TukeyHSD is simply a list and site:treatment is simply a matrix which you can manipulate as you like.

lst <- TukeyHSD(m1)
lst[['site:treatment']] <- lst[['site:treatment']][-c(5,6,7,10,11,12,15,16,18,19,20,21),]