I want to group a data frame by two columns (department and product line) and output a new data frame that has counts of selected logical values of each department and product line. The structure of the original data is as follows:
product department line date
apple A big 201707
cherry A midlle 201609
potato B midlle 201801
peach C small 201807
pear B big 201807
The date is numeric and the other variables are characters.
I want to add two columns x and y, with x indicating the date is in year 2018, and y indicating the date is 201807. Grouped by department and line, and in descending order. And the output data frame would be like:
department line x y
A big 0 0
A middle 0 0
B big 1 1
B midlle 1 0
C small 1 1
I tried with dplyr. Firstly, I subset the original data to keep only department, line and date columns. Then I set department and line to factors using factor(). When I use str(subdata), I can see department and line is in factor class.
At last, I use group_by, and summarise to get the data frame I want. But the outcome is not what I want.
DF <- subdata %>%
group_by(department, line) %>%
summarise(x = sum(data$date >= 201800, na.rm = TRUE),
y = sum(data$date == 201807, na.rm = TRUE))
Am I doing anything wrong? I've tried with reshape2 package too, but I couldn't get what I want either. I’ve 2936 rows in my data. All I get is like this:
str(DF)
classes ‘grouped_df’, ‘tb_df’, ‘tb1’ and ‘data.frame’: 1 obs. of 4 variables:
$ department : chr department
$ line : chr line
$ x : int 220
$ y : int 29
I think maybe the problem lies in the factor process of the department and line variables. Since the class after the group_by and summarise process is “character” inspite of “factor”. But I can’t figure out the solution.
Can anybody help?
data$
in thesummarise
-call. – kath