0
votes

this is probably something stupid I'm doing but I'm trying to join the values of some columns together based on whether they contain a value or not, and then create two new columns called start.week and end.week.

My start.week is "Monday, Tuesday and Wednesday". The end.week is "Thursday and Friday".

 Name        Monday Tuesday Wednesday   Thursday    Friday
 John        Red            Pink        
 Francis     Blue   Gray                 Black  
 Bill        Green          Orange       Purple 
 Bob         Yellow                      Lilac      Magenta

I can add the two columns together for Thursday and Friday using:

start.week = c("Monday", "Tuesday", "Wednesday")
end.week = c("Thursday", "Friday")

options(stringsAsFactors = FALSE)
df = mutate(df, end.week = ifelse(Friday != "", paste0(Thursday, " + ", Friday), Thursday))

Back I can't work out how to to this for start.week

Can anyone please give me a hint? I'd be forever grateful.

Original data:

df = structure(list(Name = c("John", "Francis", "Bill", "Bob"), Monday = 
c("Red", "Blue", "Green", "Yellow"), Tuesday = c("", "Gray", "", ""), 
Wednesday = c("Pink", "", "Orange", ""), Thursday = c("", 
"Black", "Purple", "Lilac"), Friday = c("", "", "", "Magenta"
)), class = "data.frame", row.names = c(NA, -4L))

Expected Output:

 df = structure(list(Name = c("John", "Francis", "Bill", "Bob"), Monday = 
 c("Red", "Blue", "Green", "Yellow"), Tuesday = c("", "Gray", "", ""), 
Wednesday = c("Pink", "", "Orange", ""), Thursday = c("", 
"Black", "Purple", "Lilac"), Friday = c("", "", "", "Magenta"
), start.week = c("Red + Pink", "Black", "Green + Orange", 
"Yellow"), end.week = c("", "", "Purple", "Lilac + Magenta"
)), class = "data.frame", row.names = c(NA, -4L))
1
I suspect Francis' start.week should instead be "Blue + Gray", is that right?r2evans

1 Answers

1
votes

How about something like this?

library(tidyverse)
df %>%
    gather(key, val, -Name) %>%
    group_by(Name) %>%
    mutate(
        start.week = paste(val[key %in% start.week & val != ""], collapse = " + "),
        end.week = paste(val[key %in% end.week & val != ""], collapse = " + ")) %>%
    spread(key, val)
## A tibble: 4 x 8
## Groups:   Name [4]
#  Name    start.week     end.week      Friday Monday Thursday Tuesday Wednesday
#  <chr>   <chr>          <chr>         <chr>  <chr>  <chr>    <chr>   <chr>
#1 Bill    Green + Orange Purple        ""     Green  Purple   ""      Orange
#2 Bob     Yellow         Lilac + Mage… Magen… Yellow Lilac    ""      ""
#3 Francis Blue + Gray    Black         ""     Blue   Black    Gray    ""
#4 John    Red + Pink     ""            ""     Red    ""       ""      Pink

The idea is to convert data from wide to long, add the new columns start.week and end.week and then convert data back to wide.

Or we can use purrr::imap_dfc to somewhat automate generating the new columns; to do so, we need to store the new columns in a named list.

lst <- list(start.week = start.week, end.week = end.week)
df %>%
    gather(key, val, -Name) %>%
    group_by(Name) %>%
    mutate(
        tmp = list(imap_dfc(lst, ~paste(val[key %in% .x & val != ""], collapse = "+")))) %>%
    unnest() %>%
    spread(key, val)

Please note that I think there is a mistake in your expected output; start.week for Francis should be Blue + Gray and not Black.