There must be a better way of doing this. I’d probably go with Antonios’ approach but I’m tempted to not use filter
, and instead spread the prices for different colours into list columns. Unfortunately the best code I could come up with is even longer as a result:
diamonds %>%
group_by(cut, color) %>%
summarize(price = list(price)) %>%
spread(color, price) %>%
nest() %>%
mutate(price_avg = map_dbl(data, ~ t.test(.x$E[[1L]], .x$I[[1L]])$p.value))
The idea here is to get two list columns, I
and E
, for the price of diamonds of the respective colour. We can now run the t-test on these two columns (but unfortunately we need to unlist them for that to work).
I’m mainly putting this here as a conversation starter. Clearly this isn’t code you’d ever want to write but I believe that there should be a short, logical way of expressing this logic (either this is already possible and I’m overlooking it, or the tidy data API needs to be augmented).
Alternatively we can use the formula API for t.test
:
diamonds %>%
filter(color %in% c('E', 'I')) %>%
nest(-cut) %>%
mutate(price_avg = map_dbl(data, ~ t.test(price ~ color, .x)$p.value))
For completeness, here’s the same using broom::tidy
(this gives back more columns than just the p-value):
diamonds %>%
filter(color %in% c('E', 'I')) %>%
nest(-cut) %>%
mutate(test = map(data, ~ tidy(t.test(price ~ color, .x)))) %>%
unnest(test)
The result of this is a table like this:
cut data estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
<ord> <list> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct>
1 Fair <tibble [1 × 7]> -1003. 3682. 4685. -2.91 3.90e- 3 327. -1682. -324. Welch Two Sample t-test two.sided
2 Good <tibble [1 × 7]> -1655. 3424. 5079. -7.19 1.46e-12 827. -2107. -1203. Welch Two Sample t-test two.sided
3 Very Good <tibble [1 × 7]> -2041. 3215. 5256. -13.4 2.44e-39 1860. -2339. -1743. Welch Two Sample t-test two.sided
4 Premium <tibble [1 × 7]> -2407. 3539. 5946. -15.5 7.27e-52 2405. -2711. -2103. Welch Two Sample t-test two.sided
5 Ideal <tibble [1 × 7]> -1854. 2598. 4452. -17.0 7.63e-62 3081. -2069. -1640. Welch Two Sample t-test two.sided