R - dplyr Summarize and Retain Other Columns

Question

I am grouping data and then summarizing it, but would also like to retain another column. I do not need to do any evaluations of that column's content as it will always be the same as the group_by column. I can add it to the group_by statement but that does not seem "right". I want to retain State.Full.Name after grouping by State. Thanks

TDAAtest <- data.frame(State=sample(state.abb,1000,replace=TRUE))
TDAAtest$State.Full.Name <- state.name[match(TDAAtest$State,state.abb)]


TDAA.states <- TDAAtest %>%
  filter(!is.na(State)) %>%
  group_by(State) %>%
  summarize(n=n()) %>%
  ungroup() %>%
  arrange(State)

Which column you want to summarise? Based on the code, you are doing it by both the columns — akrun
It's probably the best just grouping by both. The alternative is summarize(State.Full.Name = unique(State.Full.Name), n=n()), which is less concise. — alistaire
@akrun I clarified. I had left in my not so "right" approach. — atclaus
Also, count(...) is equivalent to group_by(...) %>% summarise(n = n()) — alistaire

akrun akrun · Accepted Answer · 2016-08-23T04:05:27

Perhaps we need

TDAAtest %>% 
     filter(!is.na(State)) %>%
     group_by(State) %>% 
     summarise(State.Full.Name = first(State.Full.Name), n = n())

Or use mutate to create the column and then do the distinct

TDAAtest %>% f
     filter(!is.na(State)) %>%
     group_by(State) %>% 
     mutate(n= n()) %>% 
     distinct(State, .keep_all=TRUE)

R - dplyr Summarize and Retain Other Columns

3 Answers