Dplyr: subtracting within uneven factor levels

Question

I am trying to learn dplyr, and I cannot find an answer for a relatively simple question on Stackoverflow or the documentation. I thought I'd ask it here.

I have a data.frame that looks like this:

set.seed(1)
dat<-data.frame(rnorm(10,20,20),rep(seq(5),2),rep(c("a","b"),5))
names(dat)<-c("number","factor_1","factor_2")
dat<-dat[order(dat$factor_1,dat$factor_2),]
dat<-dat[c(-3,-7),]



       number factor_1 factor_2
1   7.470924        1        a
6   3.590632        1        b
2  23.672866        2        b
3   3.287428        3        a
8  34.766494        3        b
4  51.905616        4        b
5  26.590155        5        a
10 13.892232        5        b

I would like to use dplyr to subtract the values number column associated with factor_2=="b" from factor_2=="a" within each level of factor one.

The first line of the resulting data.frame would look like:

        diff factor_1
1    3.880291        1

A caveat is that there are not always values for each level of factor_2 within each level of factor_1. Should this be the case, I would like to assign 0 to the number associated with the missing factor level.

Thank you for your help.

Marat Talipov Marat Talipov · Accepted Answer · 2015-02-26T22:07:10

Here is one approach:

set.seed(1)
dat<-data.frame(rnorm(10,20,20),rep(seq(5),2),rep(c("a","b"),5))
names(dat)<-c("number","factor_1","factor_2")
dat<-dat[order(dat$factor_1,dat$factor_2),]
dat<-dat[c(-3,-7),]
#      number factor_1 factor_2
#1   7.470924        1        a
#6   3.590632        1        b
#2  23.672866        2        b
#3   3.287428        3        a
#8  34.766494        3        b
#4  51.905616        4        b
#5  26.590155        5        a
#10 13.892232        5        b

library(dplyr)
dat %>% 
  group_by(factor_1) %>% 
  summarize(diff=number[match('a',factor_2)]-number[match('b',factor_2)]) -> 
  d2

d2$diff[is.na(d2$diff)] <- 0

d2
# Source: local data frame [5 x 2]
# 
#   factor_1       diff
# 1        1   3.880291
# 2        2   0.000000
# 3        3 -31.479066
# 4        4   0.000000
# 5        5  12.697923

Dplyr: subtracting within uneven factor levels

2 Answers