Comparing two groups with multiple variables using anova or another testing method in r

Question

Working on my master thesis right now. I have 2 groups: Showering as usual and Cold shower group. Variables are age, gender, weight, psychological wellbeing, physiological wellbeing, sleep quality, movement behvaior, skin texture, shower behavior etc.

Head(data1)

Code Gruppe StudentBasel Alter Grösse Gewicht0W Gewicht12W 1 TURN12 2 Ja 50 159 70 72 2 AMMN17 1 Nein 26 164 52 50 3 LKPG08 2 Nein 19 167 54 NA 4 LJRn05 2 Nein 22 180 60 NA 5 AGBD08 1 Nein 24 165 49 NA 6 IUGH20 2 Nein 32 168 54 NA Geschlecht WHO1W WHO4W WHO8W WHO12W FEW1W FEW4W FEW8W FEW12W 1 w 6 21 24 25 87 70 80 75 2 w 24 22 25 22 77 78 83 74 3 w 16 NA NA NA 65 NA NA NA 4 w 19 NA NA NA 61 NA NA NA 5 w 23 18 22 NA 61 61 56 NA 6 w 22 NA NA NA 66 NA NA NA SchlafA1W SchlafA4W SchlafA8W SchlafA12W SchlafWT1W SchlafWT4W 1 32 25 25 30 49 32 2 35 31 35 28 46 43 3 28 NA NA NA 31 NA 4 23 NA NA NA 32 NA 5 27 28 26 NA 35 34 6 27 NA NA NA 41 NA

So. I have two groups and data from the 4th, 8th and 12th week. I want to compare the groups by the means on the 4th week. Running t-tests for every variable was not suggested because of some error i'm not considering. So I thought, I'll use an ANOVA like this

CSSAUW4 <- aov(formula = Gruppe ~ WHO4W + FEW4W + Dauer4W + SchlafA4W + SchlafWT4W + Einschlafzeit4W + Schwitzen + Haut4W + KHaut4W + Abwesenheit4W + Krankheitssymptome4W + Duschhäufigkeit4W, data = Group4W)

So I got all my results and was pretty happy, but I wasn't able to conduct a TukeyHSD() test, cause "Group" was not a factor. So changed it to factor a factor with as.factor(), but now I can't calculate my ANOVA anymore. Apparently I did it all wrong and should have used a aov(numeric variable ~ group) to compare everything, but then I got the same problem like on the variant with the t-test to write every code for every single variable.

So I read something about lme4 ANOVA's but I find it really difficult to understand how to code it for my data since i successfully dodged every R course in my university. I'd like to have some simple coding like: Test(Group ~ variable1, variable2, variable3, data=data1) and that's it. For Week4, Week8, Week12.

I was thinking of using lm(group ~ variable1, variable2, etc.) instead. Would that be possible and make sense for my data? I'm doubting my statistical intelligence is right on that one :D

Second question: I have the problem of having a little dataset (loss to follow up for the 12th week of 90%). So at the moment I got only 8 participants in each group. Can I do the same mean comparison on the 12th week like on the 4th week (with 25 participants each)?

Help would be really appreciated!!

Greetings Christian

you got your formula the wrong way around. The dependent variable is on the left not the right — StupidWolf
I can write something below with simulated data and that can get you started please focus your question to 1 problem — StupidWolf
Would be aov(formula = WHO4W ~ Gruppe, data = data1), right? But then I need to calculate every dependent variable individually. And this also for week8 and 12. Isn't there a much simpler way and is it statistically correct to do so? — Christian De Pierro

StupidWolf StupidWolf · Accepted Answer · 2020-03-18T16:14:54

Example data:

set.seed(100)
data1 = data.frame(
Code =sample(letters,100,replace=TRUE),
Gruppe=sample(1:2,100,replace=TRUE),
matrix(rpois(100*11,100),nrow=100)) 
colnames(data1)[-c(1:2)] = c("StudentBasel","Alter","Grösse",
"WHO1W","WHO4W","WHO8W","WHO12W","FEW1W","FEW4W","FEW8W","FEW12W")

You can select the columns you want to test:

test_columns = c("WHO4W","WHO8W","WHO12W")

So, if you just want to test say 4,8 and 12 together, for WHO4 series, you do, the select command essentially selects the columns you want to test:

library(tidyr)
library(dplyr)
library(broom)

data1 %>% 
select(c("Gruppe",test_columns)) %>% 
pivot_longer(-Gruppe)

# A tibble: 300 x 3
   Gruppe name   value
    <int> <chr>  <int>
 1      2 WHO4W     97
 2      2 WHO8W     91
 3      2 WHO12W    93
 4      1 WHO4W     99
 5      1 WHO8W    103
 6      1 WHO12W    92
 7      2 WHO4W     91
 8      2 WHO8W    111
 9      2 WHO12W   120
10      1 WHO4W    119
# … with 290 more rows

In the above step, I basically repeated for joined every week with its corresponding Gruppe, this is called pivoting a table into long format.

So what you want to do, is a test for Gruppe, within every variable, and you can do it by grouping it first (group_by) followed by the aov as you do by contained within a "do", which means do aov on every group:

result = data1 %>% 
select(c("Gruppe",test_columns)) %>% 
pivot_longer(-Gruppe) %>% 
group_by(name) %>% 
do(tidy(aov(value ~ Gruppe,data=.))) 

# A tibble: 6 x 7
# Groups:   name [3]
  name   term         df    sumsq meansq statistic p.value
  <chr>  <chr>     <dbl>    <dbl>  <dbl>     <dbl>   <dbl>
1 WHO12W Gruppe        1   131.   131.      1.25     0.266
2 WHO12W Residuals    98 10247.   105.     NA       NA    
3 WHO4W  Gruppe        1   111.   111.      1.01     0.316
4 WHO4W  Residuals    98 10740.   110.     NA       NA    
5 WHO8W  Gruppe        1     1.63   1.63    0.0169   0.897
6 WHO8W  Residuals    98  9428.    96.2    NA       NA

Now we simply take out on terms that contain Gruppe, we are not interested in the residuals:

result %>% filter(term=="Gruppe")
# A tibble: 3 x 7
# Groups:   name [3]
  name   term      df  sumsq meansq statistic p.value
  <chr>  <chr>  <dbl>  <dbl>  <dbl>     <dbl>   <dbl>
1 WHO12W Gruppe     1 131.   131.      1.25     0.266
2 WHO4W  Gruppe     1 111.   111.      1.01     0.316
3 WHO8W  Gruppe     1   1.63   1.63    0.0169   0.897

I suggest this above because it is easier to explain to people what you have done (you cannot say I did an anova..), and easier to interpret. You can use a big aov and do a posthoc, but please read up and understand what anova is doing before applying this:

#pivot long like before
aov_df = data1 %>% 
select(c("Gruppe",test_columns)) %>% 
pivot_longer(-Gruppe)
# now we have a sub group for every measurement, eg. group 1 + wk4, group #2 + wk4 and so on
aov_df$subgroup = paste0(aov_df$name,aov_df$Gruppe)

result = TukeyHSD(aov(value ~ subgroup,data=aov_df))
# the below are the meaningful comparisons you need:
result$subgroup[c("WHO12W2-WHO12W1","WHO4W2-WHO4W1","WHO8W2-WHO8W1"),]
                      diff       lwr      upr     p adj
WHO12W2-WHO12W1  2.2938808 -3.560239 8.148000 0.8711455
WHO4W2-WHO4W1    2.1151369 -3.738983 7.969256 0.9052955
WHO8W2-WHO8W1   -0.2560386 -6.110158 5.598081 0.9999956

Comparing two groups with multiple variables using anova or another testing method in r

1 Answers