3
votes

I have a dataset like below:

BRAND  MEDIUM      W1   W2   W3   W4   W5

   B1   tv          1    0    1    0    2
   B2   tv          0    0    0    0    0
   B1   radio       0    1    2    5    3
   B1   tv          0    0    0    0    0
   B2   radio       0    4    1    1    1
   B1   newspapers  7    4    2    1    0

What I am trying to do is to group by BRAND and sum the value along each column to form a unique row made up of the summed values. Furthermore I'd also like to show which MEDIUM were in each subgroup.

The final output should be the following:

          W1    W2   W3   W4   W5  tv radio newspaper 

   B1      8    9    5    6    5   1   1       1
   B2      0    4    1    1    1   1   1       0

I'm having hard time to find a way to solve this, especially the second part to put which medium was in the group. Any suggestion or hints? Thanks

2

2 Answers

2
votes

Here is an idea that creates two different data frames and joins them together to get your expected output. The first is an aggregation on BRAND, and the second one dummifies the MEDIUM i.e.

library(tidyverse)

df1 <- df %>% 
    group_by(BRAND) %>% 
    summarise_at(vars(starts_with('W')), funs(sum))

# A tibble: 2 x 6
#  BRAND    W1    W2    W3    W4    W5
#  <fct> <int> <int> <int> <int> <int>
#1 B1        8     5     5     6     5
#2 B2        0     4     1     1     1

df2 <- df %>% 
    select(BRAND, MEDIUM) %>% 
    group_by(BRAND) %>% 
    mutate(new = 1) %>% 
    distinct() %>% 
    spread(MEDIUM, new, fill = 0)

# A tibble: 2 x 4
# Groups:   BRAND [2]
#  BRAND newspapers radio    tv
#  <fct>      <dbl> <dbl> <dbl>
#1 B1             1     1     1
#2 B2             0     1     1

Join them together to get,

left_join(df1, df2)
#Joining, by = "BRAND"
# A tibble: 2 x 9
#  BRAND    W1    W2    W3    W4    W5 newspapers radio    tv
#  <fct> <int> <int> <int> <int> <int>      <dbl> <dbl> <dbl>
#1 B1        8     5     5     6     5          1     1     1
#2 B2        0     4     1     1     1          0     1     1
4
votes

Mybe in late, but you can also try to manage it without any external packages, in two steps:

First aggregate the W... by BRAND :

# aggregate by brand, and not using the MEDIUM column:
one  <-   aggregate(. ~ BRAND, data = dats[,-2], sum) 

Then the mediums:

# frequencies
# EDIT HERE
# two <- as.data.frame.matrix(table(dats$BRAND,dats$MEDIUM))  
# replace with ones
# two[(two)>1] <- 1

# thanks to Ronak Shah
two <- data.frame(BRAND = unique(dats$BRAND), +(table(dats$BRAND, dats$MEDIUM) > 0))

Last you can merge the two datasets:

merge(one, two)
  BRAND W1 W2 W3 W4 W5 newspapers radio tv
1    B1  8  5  5  6  5          1     1  1
2    B2  0  4  1  1  1          0     1  1