6
votes

Using the following dataframe I would like to group the data by replicate and group and then calculate a ratio of treatment values to control values.

structure(list(group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L), .Label = c("case", "controls"), class = "factor"), treatment = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "EPA", class = "factor"), 
    replicate = structure(c(2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L), .Label = c("four", 
    "one", "three", "two"), class = "factor"), fatty_acid_family = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "saturated", class = "factor"), 
    fatty_acid = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "14:0", class = "factor"), 
    quant = c(6.16, 6.415, 4.02, 4.05, 4.62, 4.435, 3.755, 3.755
    )), .Names = c("group", "treatment", "replicate", "fatty_acid_family", 
"fatty_acid", "quant"), class = "data.frame", row.names = c(NA, 
-8L))

I have tried using dplyr as follows:

group_by(dataIn, replicate, group) %>% transmute(ratio = quant[group=="case"]/quant[group=="controls"])

but this results in Error: incompatible size (%d), expecting %d (the group size) or 1

Initially I thought this might be because I was trying to create 4 ratios from a df 8 rows deep and so I thought summarise might be the answer (collapsing each group to one ratio) but that doesn't work either (my understanding is a shortcoming).

group_by(dataIn, replicate, group) %>% summarise(ratio = quant[group=="case"]/quant[group=="controls"])

  replicate    group ratio
1      four     case    NA
2      four controls    NA
3       one     case    NA
4       one controls    NA
5     three     case    NA
6     three controls    NA
7       two     case    NA
8       two controls    NA

I would appreciate some advice on where I'm going wrong or even if this can be done with dplyr.

Thanks.

2
don't group by groupeddi

2 Answers

10
votes

You can try:

group_by(dataIn, replicate) %>% 
    summarise(ratio = quant[group=="case"]/quant[group=="controls"])
#Source: local data frame [4 x 2]
#
#  replicate    ratio
#1      four 1.078562
#2       one 1.333333
#3     three 1.070573
#4       two 1.446449

Because you grouped by replicate and group, you could not access data from different groups at the same time.

0
votes

@talat's answer solved for me. I created a minimal reproducible example to help my own understanding:

df <- structure(list(a = c("a", "a", "b", "b", "c", "c", "d", "d"), 
    b = c(1, 2, 1, 2, 1, 2, 1, 2), c = c(22, 15, 5, 0.2, 107, 
    6, 0.2, 4)), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"))

#   a b     c
# 1 a 1  22.0
# 2 a 2  15.0
# 3 b 1   5.0
# 4 b 2   0.2
# 5 c 1 107.0
# 6 c 2   6.0
# 7 d 1   0.2
# 8 d 2   4.0

library(dplyr)

df %>%  
  group_by(a) %>% 
  summarise(prop = c[b == 1] / c[b == 2])

#   a      prop
# 1 a  1.466667
# 2 b 25.000000
# 3 c 17.833333
# 4 d  0.050000