3
votes

I have a table that (after some initial processing) has multiple rows with the same main identifier but with different column values (either 0 or a value > 0).

Example table with main identifier "produce"

df = data.frame(produce = c("apples","apples", "bananas","bananas"),
                grocery1=c(0,1,1,1),
                grocery2=c(1,0,1,1),
                grocery3=c(0,0,1,1))


###########################

> df
  produce grocery1 grocery2 grocery3
1  apples        0        1        0
2  apples        1        0        0
3 bananas        1        1        1
4 bananas        1        1        1

I'd like to collapse (or merge?) rows with the same identifier and retain the non-empty (here, any nonzero values) values in each column

Example desired output

 shopping grocery1 grocery2 grocery3
1   apples        1        1        0
2  bananas        1        1        1

Is there a simple function or piping in tidyverse that I am missing and that can handle this?

2
dplyr::group_by() plus dplyr::summarise()bbiasi

2 Answers

2
votes

Using base R aggregate we can do

aggregate(.~produce, df, function(x) +any(x > 0))

#  produce grocery1 grocery2 grocery3
#1  apples        1        1        0
#2 bananas        1        1        1

Or using dplyr

library(dplyr)
df %>%
  group_by(produce) %>%
  summarise_all(~+any(. > 0))

#  produce grocery1 grocery2 grocery3
#  <fct>      <int>    <int>    <int>
#1 apples         1        1        0
#2 bananas        1        1        1

and same with data.table

library(data.table)
setDT(df)[, lapply(.SD, function(x) +any(x > 0)), by=produce]
1
votes

We can use max

library(dplyr)
df %>%
   group_by(produce) %>% 
   summarise_all(max)
# A tibble: 2 x 4
#  produce grocery1 grocery2 grocery3
#  <fct>      <dbl>    <dbl>    <dbl>
#1 apples         1        1        0
#2 bananas        1        1        1