0
votes

In our experiment, we have a data frame with the following columns:

Participant Condition parametricVariables nonParametricVariables orderNumber
1 Condition 1 14.7 4 1
1 Condition 2 11.4 1 2
2 Condition 1 8.2 7 2
2 Condition 2 13.0 6 1
... ... ... ... ...

We have multiple parametric and multiple non-parametric variables and only two conditions. orderNumber column represents the order in which the given participant tested the given condition - so the participant 1 first tested the Condition 1 and then Condition 2, while participant 2 tested them in the opposite order.

We are trying to see whether there is, despite our best efforts, an unsystematic variation based on the order of the conditions. So far we have just been using function calls and read the results from the output like this:

ParticipantOrder1 <- gameSummary %>% filter(orderNumber == 1)
Condition1Order1 <- ParticipantOrder1 %>% filter(Condition==condition1_label)
Condition2Order1 <- ParticipantOrder1 %>% filter(Condition==condition2_label)

ParticipantOrder2 <- gameSummary %>% filter(orderNumber == 2)
Condition1Order2 <- ParticipantOrder2 %>% filter(Condition==condition1_label)
Condition2Order2 <- ParticipantOrder2 %>% filter(Condition==condition2_label)

# Check parametric variables for normality
# ...

# Check for difference in the parametric variable across the two orders using Welch's t-test
t.test(ParticipantOrder1$parametric, ParticipantOrder2$parametric)
t.test(Condition1Order1$parametric, Condition1Order2$parametric)
t.test(Condition2Order1$parametric, Condition2Order2$parametric)

# Check for difference in the non-parametric variable across the two orders using Wilcoxon signed ranked test
wilcox.test(ParticipantOrder1$nonParametric, ParticipantOrder1$nonParametric, paired=TRUE,exact=FALSE)
wilcox.test(Condition1Order1$nonParametric, Condition1Order2$nonParametric, paired=TRUE,exact=FALSE)
wilcox.test(Condition2Order1$nonParametric, Condition2Order2$nonParametric, paired=TRUE,exact=FALSE)

As you can see, this approach gets rather unwieldy when one has multiple parametric and non-parametric variables. I am wondering whether there is a nicer way to collect all these test results into a table like this:

Variable Condition TestType statistic p-value
parametric1 Both Welch Two Sample t-test 0.10317 0.9185
parametric1 Condition 1 Welch Two Sample t-test 0.625 0.5462
parametric1 Condition 2 Welch Two Sample t-test -0.69369 0.503
nonParametric1 Both Wilcoxon signed rank test with continuity correction 18 0.6295
... ... ... ... ...
1

1 Answers

0
votes

Group the data

First, we should group all data by a groupingVariable.

analysisSummary <- gameSummary %>%
  select(parametric1, parametric2, nonparametric1, groupingVariable) %>%
  gather(key = variable, value = value, -groupingVariable) %>%
  group_by(variable, groupingVariable) %>%
  summarise(value=list(value)) %>%
  spread(groupingVariable, value) %>% 
  group_by(variable)

If you want to see how this query was built, I recommend looking at this tutorial by Sebastian Sauer.

This will give us the following table with the groupingValues, which are the values of groupingVariable:

variable groupingValue1 groupingValue2
parametric1 <dbl [X]> <dbl [Y]>
parametric1 <dbl [X]> <dbl [Y]>
nonparametric1 <dbl [X]> <dbl [Y]>

parametric1, parametric1 and nonparametric1 are the variables you want to compare between the two groups.

groupingVariable is the metric you divide the populations by. For instance, it could be sex, in which case the groupingValues would probably be male and female [1]. Or, going by the example from the question, groupingVariable could be orderNumber and groupingValues would be 1 and 2. Note that these have numerical values - and that brings us to a problem.

Numerical groupingVariables

R will treat numerical values for columns not as names, but as order number for columns in the table. If you want readable code, you can rename these columns to order1 and order2 using

analysisSummary <- analysisSummary %>% rename(order1 = 2, order2 = 3)

Assuming the groupingValue1 and groupingValue2 columns are on the 2nd and 3rd position in the table, respectively.

Run the tests

We can use case_when to conditionally run different tests for different variables.

analysisSummary %>% mutate(
    # Save the name of the test for convenient reference later
    test = case_when(
        isVariableParametric(variable) ~ "Welch's t test", TRUE ~ "Wilcoxon test"
    ),
    # Run the t-test for parametric variables and Wilcoxon signed rank test for non-parametric ones, save the p-value
    p_value = case_when(
        isVariableParametric(variable) ~ t.test(unlist(groupingVariable1), unlist(groupingVariable2))$p.value,
            TRUE ~ wilcox.test(unlist(groupingVariable1), unlist(groupingVariable2), paired=FALSE, exact=FALSE)$p.value
    ),
    # Run the test again, but now save the effect size
    statistic = case_when(
        isVariableParametric(variable) ~ t.test(unlist(groupingVariable1), unlist(groupingVariable2))$statistic,
            TRUE ~ wilcox.test(unlist(groupingVariable1), unlist(groupingVariable2), paired=FALSE, exact=FALSE)$statistic
    ),
)

You should also define a function that decides whether the variable is parametric or not. In my case I hardcoded it (but a long-term, reusable solution would be to resolve it dynamically):

isVariableParametric <- function(variable) {
  variable %in% c('parametric1', 'parametric2')
}

This will give us a table with easily browsable results:

variable groupingValue1 groupingValue2 test p-value statistic
parametric1 <dbl [X]> <dbl [Y]> Welch's t test 0.19081 0.23504
parametric1 <dbl [X]> <dbl [Y]> Welch's t test 0.16398 0.00014
nonparametric1 <dbl [X]> <dbl [Y]> Wilcoxon test 0.78727 87.5000

[1] Sticking to two groups for simplicity here, as testing for difference between multiple groups require additional checks statistics-wise (Bonferroni correction) or a different approach (ANOVA).