Quick and short of it is I'm having problems summarizing count and aggregate functions with conditions on the same factor.
Suppose I have this dataframe:
library(dplyr)
df = tbl_df(data.frame(
company=c("Acme", "Meca", "Emca", "Acme", "Meca", "Emca"),
year=c("2011", "2010", "2009", "2011", "2010", "2013"),
product=c("Wrench", "Hammer", "Sonic Screwdriver", "Fairy Dust",
"Kindness", "Helping Hand"),
price=c("5.67", "7.12", "12.99", "10.99", NA, FALSE)))
which creates this dataframe (in essence):
company year product price
1 Acme 2011 Wrench 5.67
2 Meca 2010 Hammer 7.12
3 Emca 2009 Sonic Screwdriver 12.99
4 Acme 2011 Fairy Dust 10.99
5 Meca 2010 Kindness NA
... ... ... ... ...
n Emca 2013 Helping Hand FALSE
Let's say I want to df <- group_by(df, company, year, product) and then get the following info all in one collection (i.e. dataframe):
- Count of each price listing (including NA, False)
- Count of each with 'NA' condition
- Average price excluding NA and False
Max price
summarize(df, count = n()) #satisfies first item obviously
I'm having issues trying to get the others. I think I need to use pipe operators? If so, can anyone provide some guidance?
This is what I've tried and it is blatantly wrong, but I'm not sure where to go next:
summarize(df,
total.count = n(),
count = filter(df, is.na(price)),
avg.price = filter(df, !is.na(price), price != FALSE),
max.price = max(filter(df, !is.na(price), price != FALSE))
And yes, I have reviewed documentation and I'm sure the answers are there, but they might be too advanced for my understanding. Thanks in advance!
NAsarefactorlevels. So,is.nareturns FALSE. Is it the case in your original dataset?. When you create example data, please don't use"NA", it will read ascharacterand then convert to factor whenstringsAsFactors=FALSEis not specified. Just useNA- akrun