Need to Convert from Wide to Long

Question

Hi, I have this dataset with unique Id variables in column A and then subsequent renal scans for each patient. It is a csv file and I want to reshape it to long format using R if possible. Each participant can have a number of renal scans ranging from 1-17.

Also some Id are listed as 'No' for not receiving scan. I want it to be reshaped to something similar to this

I know previous questions on this organize by year, I have scans from participants that occur multiple times during the year date format yyyy-mm-dd

Please see data below

structure(list(id = c(1010001, 1010002, 1010004, 1010005, 1010006, 
1010007), `GFR Scans?` = c("Yes", "Yes", "Yes", "Yes", "Yes", 
"No"), `1. Date of renal scan:` = structure(c(1133913600, 1196812800, 
1237334400, 1124150400, 1192060800, NA), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), `1. Type of renal scan:` = c("DTPA", 
"DTPA", "DTPA", "DTPA", "DTPA", NA), `1. GFR mL/1.73 sq.m` = c(18, 
13, 68, 117, 46, NA), `1. Pre/Post tx?` = c("Pre", "Pre", "Post", 
"Post", "Pre", NA), `2. Date of renal scan:` = structure(c(1146528000, 
1214524800, NA, 1151366400, 1245974400, NA), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), `2. Type of renal scan:` = c("DTPA", 
"DTPA", NA, "DTPA", "DTPA", NA), `2. GFR mL/1.73 sq.m` = c(86, 
110, NA, 148, 123, NA), `2. Pre/Post tx?` = c("Post", "Post", 
NA, "Post", "Post", NA), `3. Date of renal scan:` = structure(c(NA, 
1219104000, NA, 1184025600, NA, NA), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), `3. Type of renal scan:` = c(NA, "DTPA", NA, 
"DTPA", NA, NA), `3. GFR mL/1.73 sq.m` = c(NA, 92, NA, 166, NA, 
NA), `3. Pre/Post tx?` = c(NA, "Post", NA, "Post", NA, NA), `4. Date of    renal scan:` = structure(c(NA, 
1242691200, NA, 1213660800, NA, NA), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), `4. Type of renal scan:` = c(NA, "DTPA", NA, 
"DTPA", NA, NA), `4. GFR mL/1.73 sq.m` = c(NA, 36, NA, 171, NA, 
NA), `4. Pre/Post tx?` = c(NA, "Post", NA, "Post", NA, NA), `5. Date of    renal scan:` = structure(c(NA, 
NA, NA, 1288656000, NA, NA), class = c("POSIXct", "POSIXt"), tzone =  "UTC"), 
    `5. Type of renal scan:` = c(NA, NA, NA, "DTPA", NA, NA), 
    `5. GFR mL/1.73 sq.m` = c(NA, NA, NA, 105, NA, NA), `5. Pre/Post  tx?` = c(NA, 
    NA, NA, "Post", NA, NA), `6. Date of renal scan:` = structure(c(NA, 
    NA, NA, 1323129600, NA, NA), class = c("POSIXct", "POSIXt"
    ), tzone = "UTC"), `6. Type of renal scan:` = c(NA, NA, NA, 
    "DTPA", NA, NA), `6. GFR mL/1.73 sq.m` = c(NA, NA, NA, 103, 
    NA, NA), `6. Pre/Post tx?` = c(NA, NA, NA, "Post", NA, NA
    ), `7. Date of renal scan:` = structure(c(NA, NA, NA, 1355184000, 
    NA, NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    `7. Type of renal scan:` = c(NA, NA, NA, "DTPA", NA, NA), 
    `7. GFR mL/1.73 sq.m` = c(NA, NA, NA, 98, NA, NA), `7. Pre/Post tx?` = c(NA, 
    NA, NA, "Post", NA, NA), `8. Date of renal scan:` = c(NA, 
    NA, NA, NA, NA, NA), `8. Type of renal scan:` = c(NA, NA, 
    NA, NA, NA, NA), `8. GFR mL/1.73 sq.m` = c(NA, NA, NA, NA, 
    NA, NA), `8. Pre/Post tx?` = c(NA, NA, NA, NA, NA, NA), `9. Date of renal scan:` = c(NA, 
    NA, NA, NA, NA, NA), `9. Type of renal scan:` = c(NA, NA, 
    NA, NA, NA, NA), `9. GFR mL/1.73 sq.m` = c(NA, NA, NA, NA, 
    NA, NA), `9. Pre/Post tx?` = c(NA, NA, NA, NA, NA, NA), `10. Date   of renal scan:` = c(NA, 
    NA, NA, NA, NA, NA), `10. Type of renal scan:` = c(NA, NA, 
    NA, NA, NA, NA), `10. GFR mL/1.73 sq.m` = c(NA, NA, NA, NA, 
    NA, NA), `10. Pre/Post tx?` = c(NA, NA, NA, NA, NA, NA), 
    `11. Date of renal scan:` = c(NA, NA, NA, NA, NA, NA), `11. Type of  renal scan:` = c(NA, 
    NA, NA, NA, NA, NA), `11. GFR mL/1.73 sq.m` = c(NA, NA, NA, 
    NA, NA, NA), `11. Pre/Post tx?` = c(NA, NA, NA, NA, NA, NA
    ), `12. Date of renal scan:` = c(NA, NA, NA, NA, NA, NA), 
    `12. Type of renal scan:` = c(NA, NA, NA, NA, NA, NA), `12. GFR mL/1.73 sq.m` = c(NA, 
    NA, NA, NA, NA, NA), `12. Pre/Post tx?` = c(NA, NA, NA, NA, 
    NA, NA), `13. Date of renal scan:` = c(NA, NA, NA, NA, NA, 
    NA), `13. Type of renal scan:` = c(NA, NA, NA, NA, NA, NA
    ), `13. GFR mL/1.73 sq.m` = c(NA, NA, NA, NA, NA, NA), `13. Pre/Post tx?` = c(NA, 
    NA, NA, NA, NA, NA), `14. Date of renal scan:` = c(NA, NA, 
    NA, NA, NA, NA), `14. Type of renal scan:` = c(NA, NA, NA, 
    NA, NA, NA), `14. GFR mL/1.73 sq.m` = c(NA, NA, NA, NA, NA, 
    NA), `14. Pre/Post tx?` = c(NA, NA, NA, NA, NA, NA), `15. Date of renal scan:` = c(NA, 
    NA, NA, NA, NA, NA), `15. Type of renal scan:` = c(NA, NA, 
    NA, NA, NA, NA), `15. GFR mL/1.73 sq.m` = c(NA, NA, NA, NA, 
    NA, NA), `15. Pre/Post tx?` = c(NA, NA, NA, NA, NA, NA), 
    `16. Date of renal scan:` = c(NA, NA, NA, NA, NA, NA), `16. Type of  renal scan:` = c(NA, 
    NA, NA, NA, NA, NA), `16. GFR mL/1.73 sq.m` = c(NA, NA, NA, 
    NA, NA, NA), `16. Pre/Post tx?` = c(NA, NA, NA, NA, NA, NA
    ), `17. Date of renal scan:` = c(NA, NA, NA, NA, NA, NA), 
    `17. Type of renal scan:` = c(NA, NA, NA, NA, NA, NA), `17. GFR mL/1.73 sq.m` = c(NA, 
    NA, NA, NA, NA, NA), `17. Pre/Post tx?` = c(NA, NA, NA, NA, 
    NA, NA)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

The first image is the original wide format, second image is what I am trying to get. None of the other wide to long answers on this have helped me as I have multiple columns involved.

e.g. id 1010001 has had two scans, I need this listed one after each other and not beside each other (see image two).

I would really appreciate your help.

So the idea is to sort the table into ID and shift the second and third groupings into the first? — bob1
Yes, Group by ID and then list the subsequent scans underneath instead of side by side. Some of the ID have up to 17 scans (columns to the side). — user8722680
Also some IDs didn't receive any scans - listed as No. These also need to be listed, with only one row for these as the have no subsequent linked columns — user8722680

Uwe Uwe · Accepted Answer · 2019-02-23T08:00:57

This question has been asked several times before, e.g., Reshaping multiple sets of measurement columns (wide format) into single columns (long format). One possible approach is to use data.table's melt() function which is able to reshape multiple value columns simultaneously.

However, here is an additional difficulty which justifies an answer on its own, IMHO. The column names occassionally contain surplus white space which needs to be removed beforehand in order to have a consistent naming pattern for the columns.

names(df1)

 [1] "id"                        "GFR Scans?"                "1. Date of renal scan:"    "1. Type of renal scan:"   
 [5] "1. GFR mL/1.73 sq.m"       "1. Pre/Post tx?"           "2. Date of renal scan:"    "2. Type of renal scan:"   
 [9] "2. GFR mL/1.73 sq.m"       "2. Pre/Post tx?"           "3. Date of renal scan:"    "3. Type of renal scan:"   
[13] "3. GFR mL/1.73 sq.m"       "3. Pre/Post tx?"           "4. Date of    renal scan:" "4. Type of renal scan:"   
[17] "4. GFR mL/1.73 sq.m"       "4. Pre/Post tx?"           "5. Date of    renal scan:" "5. Type of renal scan:"   
[21] "5. GFR mL/1.73 sq.m"       "5. Pre/Post  tx?"          "6. Date of renal scan:"    "6. Type of renal scan:"   
[25] "6. GFR mL/1.73 sq.m"       "6. Pre/Post tx?"           "7. Date of renal scan:"    "7. Type of renal scan:"   
[29] "7. GFR mL/1.73 sq.m"       "7. Pre/Post tx?"           "8. Date of renal scan:"    "8. Type of renal scan:"   
[33] "8. GFR mL/1.73 sq.m"       "8. Pre/Post tx?"           "9. Date of renal scan:"    "9. Type of renal scan:"   
[37] "9. GFR mL/1.73 sq.m"       "9. Pre/Post tx?"           "10. Date   of renal scan:" "10. Type of renal scan:"  
[41] "10. GFR mL/1.73 sq.m"      "10. Pre/Post tx?"          "11. Date of renal scan:"   "11. Type of  renal scan:" 
[45] "11. GFR mL/1.73 sq.m"      "11. Pre/Post tx?"          "12. Date of renal scan:"   "12. Type of renal scan:"  
[49] "12. GFR mL/1.73 sq.m"      "12. Pre/Post tx?"          "13. Date of renal scan:"   "13. Type of renal scan:"  
[53] "13. GFR mL/1.73 sq.m"      "13. Pre/Post tx?"          "14. Date of renal scan:"   "14. Type of renal scan:"  
[57] "14. GFR mL/1.73 sq.m"      "14. Pre/Post tx?"          "15. Date of renal scan:"   "15. Type of renal scan:"  
[61] "15. GFR mL/1.73 sq.m"      "15. Pre/Post tx?"          "16. Date of renal scan:"   "16. Type of  renal scan:" 
[65] "16. GFR mL/1.73 sq.m"      "16. Pre/Post tx?"          "17. Date of renal scan:"   "17. Type of renal scan:"

library(data.table)
library(magrittr)
# clean up column names: remove surplus whitespace
setDT(df1) %>% setnames(names(.) %>% stringr::str_replace_all("\\s+", " "))
# get name pattern for subsequent melt
cols <- names(df1)[3:6] %>% stringr::str_replace("1. ", "")
# reshape multiple columns from wide to long
long <- melt(df1, measure.vars = patterns(cols), value.name = cols, na.rm = TRUE)[
  # recreate lost POSIXct attribute
  , `Date of renal scan:` := lubridate::as_datetime(`Date of renal scan:`)][]

long

         id GFR Scans? variable Date of renal scan: Type of renal scan: GFR mL/1.73 sq.m Pre/Post tx?
 1: 1010001        Yes        1          2005-12-07                DTPA               18          Pre
 2: 1010002        Yes        1          2007-12-05                DTPA               13          Pre
 3: 1010004        Yes        1          2009-03-18                DTPA               68         Post
 4: 1010005        Yes        1          2005-08-16                DTPA              117         Post
 5: 1010006        Yes        1          2007-10-11                DTPA               46          Pre
 6: 1010001        Yes        2          2006-05-02                DTPA               86         Post
 7: 1010002        Yes        2          2008-06-27                DTPA              110         Post
 8: 1010005        Yes        2          2006-06-27                DTPA              148         Post
 9: 1010006        Yes        2          2009-06-26                DTPA              123         Post
10: 1010002        Yes        3          2008-08-19                DTPA               92         Post
11: 1010005        Yes        3          2007-07-10                DTPA              166         Post
12: 1010002        Yes        4          2009-05-19                DTPA               36         Post
13: 1010005        Yes        4          2008-06-17                DTPA              171         Post
14: 1010005        Yes        5          2010-11-02                DTPA              105         Post
15: 1010005        Yes        6          2011-12-06                DTPA              103         Post
16: 1010005        Yes        7          2012-12-11                DTPA               98         Post

In the call to melt(), we can set the parameter na.rm = FALSE to keep all data:

          id GFR Scans? variable Date of renal scan: Type of renal scan: GFR mL/1.73 sq.m Pre/Post tx?
  1: 1010001        Yes        1          2005-12-07                DTPA               18          Pre
  2: 1010002        Yes        1          2007-12-05                DTPA               13          Pre
  3: 1010004        Yes        1          2009-03-18                DTPA               68         Post
  4: 1010005        Yes        1          2005-08-16                DTPA              117         Post
  5: 1010006        Yes        1          2007-10-11                DTPA               46          Pre
 ---                                                                                                  
 98: 1010002        Yes       17                <NA>                <NA>               NA         <NA>
 99: 1010004        Yes       17                <NA>                <NA>               NA         <NA>
100: 1010005        Yes       17                <NA>                <NA>               NA         <NA>
101: 1010006        Yes       17                <NA>                <NA>               NA         <NA>
102: 1010007         No       17                <NA>                <NA>               NA         <NA>

Need to Convert from Wide to Long

2 Answers