summarizing logical values and group by multiple factors with dplyr

Question

I want to group a data frame by two columns (department and product line) and output a new data frame that has counts of selected logical values of each department and product line. The structure of the original data is as follows:

product department  line date
apple   A   big      201707
cherry  A   midlle   201609
potato  B   midlle   201801
peach   C   small    201807
pear    B   big      201807

The date is numeric and the other variables are characters.

I want to add two columns x and y, with x indicating the date is in year 2018, and y indicating the date is 201807. Grouped by department and line, and in descending order. And the output data frame would be like:

department line x y
A    big       0  0
A    middle    0  0
B    big       1  1
B    midlle    1  0
C    small     1  1

I tried with dplyr. Firstly, I subset the original data to keep only department, line and date columns. Then I set department and line to factors using factor(). When I use str(subdata), I can see department and line is in factor class.

At last, I use group_by, and summarise to get the data frame I want. But the outcome is not what I want.

DF <- subdata %>% 
    group_by(department, line) %>% 
    summarise(x = sum(data$date >= 201800, na.rm = TRUE),
              y = sum(data$date == 201807, na.rm = TRUE))

Am I doing anything wrong? I've tried with reshape2 package too, but I couldn't get what I want either. I’ve 2936 rows in my data. All I get is like this:

str(DF)
classes ‘grouped_df’, ‘tb_df’, ‘tb1’ and ‘data.frame’: 1 obs. of 4 variables:
$ department    : chr department
$ line :  chr line
$ x : int 220
$ y : int 29

I think maybe the problem lies in the factor process of the department and line variables. Since the class after the group_by and summarise process is “character” inspite of “factor”. But I can’t figure out the solution.

Can anybody help?

Following @kath's suggestion, if you remove the redundant naming of the data frame, you'll get the exact output you're looking for. No need to make it more complicated than that — camille
Thanks. I tried, but I still got only 1 observation of 4 variables instead of a data frame I want. — Arielle He

SmitM SmitM · Accepted Answer · 2018-08-16T17:02:23

I'd suggest using ifelse on the original dataframe beforehand to create columns x and y as such:

df$x <- ifelse(df$date > 201800, 1, 0)
df$y <- ifelse(df$date == 201807, 1, 0)

Now use dplyr to summarise

library(dplyr)
df_new <- df %>% group_by(department, line) %>% summarise(X = sum(x), Y = sum(y))

summarizing logical values and group by multiple factors with dplyr

3 Answers

the data: