1
votes

I want to group a data frame by two columns (department and product line) and output a new data frame that has counts of selected logical values of each department and product line. The structure of the original data is as follows:

product department  line date
apple   A   big      201707
cherry  A   midlle   201609
potato  B   midlle   201801
peach   C   small    201807
pear    B   big      201807

The date is numeric and the other variables are characters.

I want to add two columns x and y, with x indicating the date is in year 2018, and y indicating the date is 201807. Grouped by department and line, and in descending order. And the output data frame would be like:

department line x y
A    big       0  0
A    middle    0  0
B    big       1  1
B    midlle    1  0
C    small     1  1

I tried with dplyr. Firstly, I subset the original data to keep only department, line and date columns. Then I set department and line to factors using factor(). When I use str(subdata), I can see department and line is in factor class.

At last, I use group_by, and summarise to get the data frame I want. But the outcome is not what I want.

DF <- subdata %>% 
    group_by(department, line) %>% 
    summarise(x = sum(data$date >= 201800, na.rm = TRUE),
              y = sum(data$date == 201807, na.rm = TRUE))

Am I doing anything wrong? I've tried with reshape2 package too, but I couldn't get what I want either. I’ve 2936 rows in my data. All I get is like this:

str(DF)
classes ‘grouped_df’, ‘tb_df’, ‘tb1’ and ‘data.frame’: 1 obs. of 4 variables:
$ department    : chr department
$ line :  chr line
$ x : int 220
$ y : int 29

I think maybe the problem lies in the factor process of the department and line variables. Since the class after the group_by and summarise process is “character” inspite of “factor”. But I can’t figure out the solution.

Can anybody help?

3
Remove the data$ in the summarise-call.kath
Following @kath's suggestion, if you remove the redundant naming of the data frame, you'll get the exact output you're looking for. No need to make it more complicated than thatcamille
Thanks. I tried, but I still got only 1 observation of 4 variables instead of a data frame I want.Arielle He

3 Answers

0
votes

I'd suggest using ifelse on the original dataframe beforehand to create columns x and y as such:

df$x <- ifelse(df$date > 201800, 1, 0)
df$y <- ifelse(df$date == 201807, 1, 0)

Now use dplyr to summarise

library(dplyr)
df_new <- df %>% group_by(department, line) %>% summarise(X = sum(x), Y = sum(y))
0
votes

Try this one:

 library(tidyverse)
 df<-data.frame(product=as.character(c("apple","cherry","potato","peach","pear")),
                 department=as.character(c("A","A","B","C","B")),
                 line=c("big","midlle","midlle","small","big"),
                 date=as.character(c("201707","201609","201801","201807","201807")))

 df%>%
   mutate(yr= as.numeric(str_sub(date,1,4)),
          x=ifelse(yr==2018,1,0),
          y=ifelse(date=="201807",1,0))%>%
   group_by(department,line)%>%
   summarise(x=sum(x,na.rm = T),
             y=sum(y,na.rm = T))
# A tibble: 5 x 4
# Groups:   department [?]
  department line       x     y
  <fct>      <fct>  <dbl> <dbl>
1 A          big        0     0
2 A          midlle     0     0
3 B          big        1     1
4 B          midlle     1     0
5 C          small      1     1
0
votes

Here's a different approach using grepl:

library(tidyverse)

result <- data %>% 
  group_by(department, line) %>% 
  summarise(x = as.numeric(grepl("2018", date)),
            y = as.numeric(grepl("201807", date)))

result
## A tibble: 5 x 4
## Groups:   department [?]
#  department line       x     y
#  <fct>      <fct>  <dbl> <dbl>
#1 A          big        0     0
#2 A          midlle     0     0
#3 B          big        1     1
#4 B          midlle     1     0
#5 C          small      1     1

the data:

data <- read.table(header = TRUE, text = "
               product department  line date
    apple   A   big      201707
    cherry  A   midlle   201609
    potato  B   midlle   201801
    peach   C   small    201807
    pear    B   big      201807")