2
votes

It's not uncommon to want to summarise numeric columns of a dataframe or tibble, while doing something else to non-numeric columns.

There is a nice trick for this here, but it seems to fail for character columns.

First, here it is working, returning the mean of the numeric columns and the value of the first row of the other columns

set.seed(1234)
category <- (c('A','A','E','E','B','B','C'))
date <- seq(as.Date("2017-01-01"), by = "month", length.out = 7)
value1 <- sample(seq(from = 91, to = 97, by = 1))
dt <- data.frame(category, date, value1)
dt<- as_tibble(dt)
#works
dt2<- dt %>%
  group_by(category) %>%
  summarise_all(funs(if_else(is.numeric(.), mean(.), last(.))))
print(dt2)

Note that because the date column is non-numeric, it returns the value in the last row instead of the mean:

# A tibble: 4 x 3
  category date       value1
  <fct>    <date>      <dbl>
1 A        2017-02-01   92.5
2 B        2017-06-01   93.5
3 C        2017-07-01   97  
4 E        2017-04-01   94.5

However, it fails when one of the columns is chr

marsupial <-c("quoll","phascogale",'triok','opossum','antechinus','bandicoot','Fat-tailed dunnart')
dt$marsupial <- marsupial
dt3<- dt %>% #doesn't work
  group_by(category) %>%
  summarise_all(funs(if_else(is.numeric(.), mean(.), last(.))))
print(dt3)

Giving these errors:

Error in summarise_impl(.data, dots) : 
  Evaluation error: `false` must be type double, not character.
In addition: Warning message:
In mean.default(marsupial) :
  argument is not numeric or logical: returning NA

I assume the 'false' must be type double refers to the marsupial column resulting in attempt to evaluate last. If so, why must it be double, and is there another way? I wouldn't expect this from a conventional if/else conditional.

1
summarise_if(is.numeric, mean) - alistaire
If you want to keep the current structure, the issue is because dplyr::if_else is strict about types. ifelse or if ... else ... work. The former will drop the date attributes, though, which is annoying, so if (is.numeric(.)) mean(.) else last(.) is probably best. - alistaire
@alistaire With if (is.numeric(.)) mean(.) else last(.) I get this error: Error in summarise_impl(.data, dots) : Column "date" must be length 1 (a summary value), not 7 but maybe I misunderstood how to use it, and yes ifelse drops the date attribute. - Alex Holcombe

1 Answers

2
votes

ifelse seems to be the problem so i have created a function. I have updated my answer. i have tested it on the dates attributes and it seems to work on the list as well. I hope it'll solve your problem:

dt %>% group_by(category) %>%
  summarise_all(function(x){
  if(is.numeric(x)){
    return(mean(x))
  }else{
    nth(x,-1)
  }
}
)