2
votes

I have a data.frame with such as

df1 <- data.frame(id = c("A", "A", "B", "B", "B"), 
                  cost = c(100, 10, 120, 102, 102)

I know that I can use

df1.a <- group_by(df1, id) %>%
    summarise(no.c = n(), 
              m.costs = mean(cost))

to calculate the number of observations and mean by id. How could I do so if I want to calculate the number of observations and mean for all rows that are NOT equal to the ID, so it would for example give me 3 as value for observations not A and 2 for observations not B.

I would like to use the dplyr package and the group_by functions since I have to this for a lot of huge dataframes.

2

2 Answers

3
votes

You can use the . to refer to the whole data.frame, which lets you calculate the differences between the group and the whole:

df1 %>% group_by(id) %>% 
    summarise(n = n(), 
              n_other = nrow(.) - n, 
              mean_cost = mean(cost), 
              mean_other = (sum(.$cost) - sum(cost)) / n_other)

## # A tibble: 2 × 5
##       id     n n_other mean_cost mean_other
##   <fctr> <int>   <int>     <dbl>      <dbl>
## 1      A     2       3        55        108
## 2      B     3       2       108         55

As you can see from the results, with two groups you could just use rev, but this approach will scale to more groups or calculations easily.

2
votes

Looking for something like this? This calculates the total cost and total number of rows firstly and then subtract the total cost and total number of rows for each group and take average for the cost:

sumCost = sum(df1$cost)
totRows = nrow(df1)

df1 %>% 
        group_by(id) %>% 
        summarise(no.c = totRows - n(), 
                  m.costs = (sumCost - sum(cost))/no.c)

# A tibble: 2 x 3
#      id  no.c m.costs
#  <fctr> <int>   <dbl>
#1      A     3     108
#2      B     2      55