2
votes

right now I'm refactoring an 'base'-based R script by using 'dplyr' instead.

Basically, I want to group_by Gene and subtract the values group-wise by a group that matches a given condition. In this case, I want values of Gene == 'C' and subtract them from all others.

Simplified data:

x <- data.frame('gene' = c('A','A','A','B','B','B','C','C','C'),
                'sample' = rep_len(c('wt','mut1','mut2'),3),
                'value' = c(32.3,31,30.5,25,25.3,22.1,20.5,21.2,19.8))

  gene sample value
1    A     wt  32.3
2    A   mut1  31.0
3    A   mut2  30.5
4    B     wt  25.0
5    B   mut1  25.3
6    B   mut2  22.1
7    C     wt  20.5
8    C   mut1  21.2
9    C   mut2  19.8

Desired output:

  gene sample value deltaC
1    A     wt  32.3   11.8
2    A   mut1  31.0    9.8
3    A   mut2  30.5   10.7
4    B     wt  25.0    4.5
5    B   mut1  25.3    4.1
6    B   mut2  22.1    2.3
7    C     wt  20.5    0.0
8    C   mut1  21.2    0.0
9    C   mut2  19.8    0.0

I base, it's not a big deal, but I'm wondering whether there is a simple solution using dplyr.

'Pseudo'code:

df %>%
    group_by(Gene) %>%
    mutate(deltaC = Value - Value(where Gene == 'C'))

Is there any kind of function that allows me to access only those values of Gene == 'C'? Of course I could also do a subset before, but I would like to do it in one step :)

2

2 Answers

5
votes

You basically had it! You can subset the data frame based on any condition within your mutate call:

df <- data.frame('gene' = c('A','A','A','B','B','B','C','C','C'),
                 'sample' = rep_len(c('wt','mut1','mut2'),3),
                 'value' = c(32.3,31,30.5,25,25.3,22.1,20.5,21.2,19.8))

Nicholas Hassan pointed out a problem with the original version of this answer. While you can group by "gene" and then mutate using a filtered version of the original data.frame, what you most likely want to do is to group by "sample" and then subset within the sample group on "gene":

df %>%
    group_by(sample) %>%
    mutate(deltaC = value - value[gene == 'C'])

# A tibble: 9 x 4
# Groups:   sample [3]
  gene  sample value deltaC
  <fct> <fct>  <dbl>  <dbl>
1 A     wt      32.3   11.8
2 A     mut1    31      9.8
3 A     mut2    30.5   10.7
4 B     wt      25      4.5
5 B     mut1    25.3    4.1
6 B     mut2    22.1    2.3
7 C     wt      20.5    0  
8 C     mut1    21.2    0  
9 C     mut2    19.8    0  

Within the grouped data.frame, mutate acts on each group as its own mini-data frame, so you can subset the value vector to just the row where gene == 'C' and subtract that from the entire value variable in that group to make deltaC.

3
votes

If you wanted to avoid the $ completely, you could use dplyr::pull like so:

df %>%
  group_by(gene) %>%
  mutate(deltaC = value - filter(., gene == 'C') %>% pull(value))

dplyr::pull is basically just the pipe friendly, dplyr equivalent to df$value or df$[["value"]]

Also, using the . inside of the filter statement represents the data that is being piped into the mutate statement.