Summarizing count and conditional aggregate functions on the same factor

Question

Quick and short of it is I'm having problems summarizing count and aggregate functions with conditions on the same factor.

Suppose I have this dataframe:

library(dplyr)

df = tbl_df(data.frame(
    company=c("Acme", "Meca", "Emca", "Acme", "Meca", "Emca"), 
    year=c("2011", "2010", "2009", "2011", "2010", "2013"), 
    product=c("Wrench", "Hammer", "Sonic Screwdriver", "Fairy Dust", 
              "Kindness", "Helping Hand"), 
    price=c("5.67", "7.12", "12.99", "10.99", NA, FALSE)))

which creates this dataframe (in essence):

   company year  product             price
1    Acme  2011  Wrench              5.67
2    Meca  2010  Hammer              7.12
3    Emca  2009  Sonic Screwdriver   12.99
4    Acme  2011  Fairy Dust          10.99
5    Meca  2010  Kindness            NA
...  ...   ...   ...                 ...
n    Emca  2013  Helping Hand        FALSE

Let's say I want to df <- group_by(df, company, year, product) and then get the following info all in one collection (i.e. dataframe):

Count of each price listing (including NA, False)
Count of each with 'NA' condition
Average price excluding NA and False

Max price

summarize(df, count = n()) #satisfies first item obviously

I'm having issues trying to get the others. I think I need to use pipe operators? If so, can anyone provide some guidance?

This is what I've tried and it is blatantly wrong, but I'm not sure where to go next:

 summarize(df,
           total.count = n(),
           count = filter(df, is.na(price)),
           avg.price = filter(df, !is.na(price), price != FALSE),
           max.price = max(filter(df, !is.na(price), price != FALSE))

And yes, I have reviewed documentation and I'm sure the answers are there, but they might be too advanced for my understanding. Thanks in advance!

In the example dataset, NAs are factor levels. So, is.na returns FALSE. Is it the case in your original dataset?. When you create example data, please don't use "NA", it will read as character and then convert to factor when stringsAsFactors=FALSE is not specified. Just use NA — akrun

akrun akrun · Accepted Answer · 2014-10-27T05:21:54

Assuming that your original dataset is similar to the one you created (i.e. with NA as character. You could specify na.strings while reading the data using read.table. But, I guess NAs would be detected automatically.

The price column is factor which needs to be converted to numeric class. When you use as.numeric, all the non-numeric elements (i.e. "NA", FALSE) gets coerced to NA) with a warning.

library(dplyr)
df %>%
     mutate(price=as.numeric(as.character(price))) %>%  
     group_by(company, year, product) %>%
     summarise(total.count=n(), 
               count=sum(is.na(price)), 
               avg.price=mean(price,na.rm=TRUE),
               max.price=max(price, na.rm=TRUE))

data

I am using the same dataset (except the ... row) that was showed.

df = tbl_df(data.frame(company=c("Acme", "Meca", "Emca", "Acme", "Meca","Emca"),
 year=c("2011", "2010", "2009", "2011", "2010", "2013"), product=c("Wrench", "Hammer",
 "Sonic Screwdriver", "Fairy Dust", "Kindness", "Helping Hand"), price=c("5.67",
 "7.12", "12.99", "10.99", "NA",FALSE)))

Summarizing count and conditional aggregate functions on the same factor

1 Answers

data