I have a survey data set like this:
df <- data.frame(
employment = 0.45,
income = 0.3,
incomeFU1 = 0.4,
married = 0.1,
employmentFU1 = 0.7,
employmentFU2 = 0.8,
incomeFU2 = 0.8,
smokingFU1 = 0.6,
smokingFU3 = 0.1,
ageFU3 = 0.9,
marriedFU2 = 0.3
)
In this data set, individuals were asked about their employment status, income etc. The data is on an aggregrate level, think of this as the proportion of all people that are employment, mean income etc. Therefore the data set has only one line.
Individuals in this survey were asked at baseline and 3 follow-ups. baseline variables have no ending string, follow-up answers have an ending like "FU1" for follow-up 1 and so on.
I now have a second list of these variables:
l <- list()
l[[1]] <- c("employment", "income", "married")
l[[2]] <- c("employmentFU1", "incomeFU1", "smokingFU1")
l[[3]] <- c("employmentFU2", "incomeFU2", "marriedFU2")
l[[4]] <- c("smokingFU3", "ageFU3")
the first list item has baseline variables, the second list item has follow-up 1 variables, the third has follow-up 2 etc.
Note that some variables are available in 2 or three (sometimes even all) follow-ups, some only appear once.
I now want to reshape this data frame based on the list variables to a matrix or data frame like this:
employment income married NA NA
employmentFU1 incomeFU1 NA smokingFU1 NA
employmentFU2 incomeFU2 marriedFU2 NA NA
NA NA NA smokingFU3 ageFU3
the number of rows in this matrix is the number of list elements, 4 in this case.
I tried something like this, but did not get very far:
m <- matrix()
m[1,1] <- df[, l[[1]][1]]
m[1,2] <- l[[2]][str_detect(l[[1]][1], l[[2]])]
smokingFU3
be in the forth row (not third as in the example)? – storaged