0
votes

I have longitudinal data with multiple measurements on a number of objects over time. The data is in long format with hundreds of variables and cases:

Id    Time1   Measurement11  ...   Time2    Measurement21   ...
 1     50.4    23             ...   52.1    25               ...
 2     64.3    30             ...   67.9    35               ...
 3     70.1    20             ...   72.3    29               ...

I would like to convert it to wide format for plotting

Id    Time    Measurement1 ...
 1     50.4    23           ...
 1     52.1    25           ...
 2     64.3    30           ...
 2     67.9    35           ...
 3     70.1    20           ...
 4     72.3    29           ...

I checked resources on gather, melt, reshape, reshape2 but it seems they deal with situations where there are multiple time columns which all contain measurements. So far I have not found a good way to convert the data while maintaing the pairwise dependency on time and measurement.

My current solution is to write something like

attatch(data)
temp1<-bind_cols(Time1,Measurement11)
temp2<-bind_cols(Time2,Measurement21)

wide_format_measurement1<-bind_rows(temp1,temp2,...)

It works since there are only 5 time variables, but it does not seem very efficient. I could use unite to create pairs of data, then use gather, and finally use separate. But this is essentially the same as the bind_cols method. There must be a better way?

1
It'd be very helpful to provide a subset of your data. Please use dput() to supply a sample of your data.OTStats
I can't share it (it is medical data). I will create some more fake data.Asdf
I tried to edit the post but it does not update after save for some reason. Here is code that generates some fake data with the same structure: longdata<-data.frame(id=1:30); for (i in 1:5) { longdata[,10*(i-1)+2]<-rnorm(30,mean=60,sd=10); names(longdata)[10*(i-1)+2]<-paste0('age_',i); for (j in 2:10) { longdata[,10*(i-1)+j+1]<-sample(0:4,30,replace=TRUE); names(longdata)[10*(i-1)+j+1]<-paste0('visit_',i,'_item',j-1) } }Asdf

1 Answers

1
votes

This should also extend to more cases (e.g. time3, measurement3, time4, measurement4, etc.) provided they each end with a digit. The trick is to gather() up all the non-variable columns first, then separate() with a carefully chosen sep argument.

library(tidyverse)

df %>%
  gather(key, value, -id) %>%
  separate(key, c("var", "num"), sep = "(?=[[:digit:]])") %>%
  spread(var, value) %>%
  arrange(id) %>%
  select(-num)