Using Tidyr/Dplyr to summarise counts of groups of strings

Question

I need to summarise the counts of strings I am assigning to groups, and I know I can do it in dplyr/tidyr but I am missing something.

Example dataset:

Owner = c('bob','julia','cheryl','bob','julia','cheryl')
Day = c('Mon', 'Tue') 
Locn = c('house','store','apartment','office','house','shop')
data <- data.frame(Owner, Day, Locn)

which looks like this:

   Owner Day      Locn
1    bob Mon     house
2  julia Tue     store
3 cheryl Mon apartment
4    bob Tue    office
5  julia Mon     house
6 cheryl Tue      shop

I want to group by name and day, and then count up grouped locations in columns. In this example I want 'house' and 'apartment' to add to a column titled 'Home', and 'store', 'office' and 'shop' to be counted in a column 'Work'.

My current code (which doesn't work):

grouped_locn <- data %>%
  dplyr::arrange(Owner, Day) %>%
  dplyr::group_by(Owner, Day) %>%
  dplyr::summarize(Home = which(data$Locn %in% c('house', 'apartment')), 
               Work = which(data$Locn %in% c("store", "office", "apartment")))

I have only included my current attempt at the summarize step to show how I have been approaching it. The Home and Work code currently returns vectors of the row numbers that contain an element of the group (ie Home = 1 3 5)

My intended output:

   Owner Day   Home  Work
1    bob Mon      1     0
2    bob Tue      0     1
3  julia Mon      1     0
4  julia Tue      0     1
5 cheryl Mon      1     0
6 cheryl Tue      0     1

In the actual dataset (30k+ rows) there are multiple Locn values per Owner per Day, so the Home and Work counts can be numbers other than 1 and 0 (so no booleans).

Many thanks.

David Arenburg David Arenburg · Accepted Answer · 2015-07-16T11:57:57

Here's an easy and efficient solution using data.table

For older versions (v < 1.9.5)

library(data.table) # v < 1.9.5
setDT(data)[, Locn2 := c("Work", "Home")[(Locn %in% c('house', 'apartment')) + 1L]]
dcast.data.table(data, Owner + Day ~ Locn2, length)
#     Owner Day Home Work
# 1:    bob Mon    1    0
# 2:    bob Tue    0    1
# 3: cheryl Mon    1    0
# 4: cheryl Tue    0    1
# 5:  julia Mon    1    0
# 6:  julia Tue    0    1

For newer versions (v >= 1.9.5) you can do this in one line

dcast(setDT(data), Owner + Day ~ c("Work", "Home")[(Locn %in% c('house', 'apartment')) + 1L], length)

Here's a tidyr alternative

library(dplyr)
library(tidyr)
data %>%
  mutate(temp = 1L, 
         Locn = ifelse(Locn %in% c('house', 'apartment'), "Home", "Work")) %>% 
  spread(Locn, temp, fill = 0L)

#    Owner Day Home Work
# 1    bob Mon    1    0
# 2    bob Tue    0    1
# 3 cheryl Mon    1    0
# 4 cheryl Tue    0    1
# 5  julia Mon    1    0
# 6  julia Tue    0    1

Using Tidyr/Dplyr to summarise counts of groups of strings

4 Answers