1
votes

I'm trying to use the gather function to combine multiple rows and lengthen my wide data. Example data below:

User ID    Book 1    Book 1_YN   Book 2   Book 2_YN   Book 3   Book3_YN   Book 4   Book 4_YN
1            ABC          Y      XYZ          N         LMN        Y 
2            XYZ          Y      DEF          Y      
3            ABC          N      XYZ          Y         TUV        N       HIJ     Y 

Ideally, I want the data to look like the below table so I can summarize info about the books:

User ID    Book_Num    Book   Book_YN
1           Book 1     ABC      Y
1           Book 2     XYZ      N
1           Book 3     LMN      Y
2           Book 1     XYZ      Y
2           Book 2     DEF      Y
3           Book 1     ABC      N
3           Book 2     XYZ      Y
3           Book 3     TUV      Y
3           Book 4     HIJ      Y

When I try to use column indexing in the gather function...

data_clean <- gather(data, Book_Num, Book, data[c(2,4,6,8)]

I get the following error: "Error: data[c(2,4,6,8)] must evaluate to column positions or names, not a list"

Anyone know what that error means and/or if there is a better way to handle this task?

*Edited to change images to tables

1
Please add a reproducible example using dput and not as images. Please read the info about how to ask a good question and how to give a reproducible exampleRonak Shah
Thanks @RonakShah, I updated the images to tables. Hope that helps to some degree. I will refer to that documentation for my questions here.SummerLover2017

1 Answers

1
votes

An option is melt from data.table

library(data.table)
melt(setDT(df1), measure = patterns("^Book \\d+$", "^Book \\d+_YN$"), na.rm = TRUE,
     value.name = c("Book", "Book_YN"), variable.name = "Book_Num")[, 
      Book_Num := paste("Book", Book_Num)][order(`User ID`)]
#   User ID Book_Num Book Book_YN
#1:       1   Book 1  ABC       Y
#2:       1   Book 2  XYZ       N
#3:       1   Book 3  LMN       Y
#4:       2   Book 1  XYZ       Y
#5:       2   Book 2  DEF       Y
#6:       3   Book 1  ABC       N
#7:       3   Book 2  XYZ       Y
#8:       3   Book 3  TUV       N
#9:       3   Book 4  HIJ       Y

Or using pivot_longer

library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
    rename_at(-1, ~ str_replace(., ' (\\d+)_YN', '_YN \\1')) %>%
    pivot_longer(cols = -`User ID`, names_to = c(".value", "Book_Num"),
      names_sep=" ", values_drop_na = TRUE) %>% 
    mutate(Book_Num = str_c('Book ', Book_Num))
# A tibble: 9 x 4
#  `User ID` Book_Num Book  Book_YN
#      <int> <chr>    <chr> <chr>  
#1         1 Book 1   ABC   Y      
#2         1 Book 2   XYZ   N      
#3         1 Book 3   LMN   Y      
#4         2 Book 1   XYZ   Y      
#5         2 Book 2   DEF   Y      
#6         3 Book 1   ABC   N      
#7         3 Book 2   XYZ   Y      
#8         3 Book 3   TUV   N      
#9         3 Book 4   HIJ   Y      

data

df1 <- structure(list(`User ID` = 1:3, `Book 1` = c("ABC", "XYZ", "ABC"
), `Book 1_YN` = c("Y", "Y", "N"), `Book 2` = c("XYZ", "DEF", 
"XYZ"), `Book 2_YN` = c("N", "Y", "Y"), `Book 3` = c("LMN", NA, 
"TUV"), `Book 3_YN` = c("Y", NA, "N"), `Book 4` = c(NA, NA, "HIJ"
), `Book 4_YN` = c(NA, NA, "Y")), class = "data.frame", row.names = c(NA, 
-3L))