I have a survey question in the format: "Do you prefer a rose or a tulip? Imagine that the rose has colors V1 and V2, and the tulip has colors V3 and V4"
The actual colors are drawn from combinations contained in one dataframe:
Dataframe 1 (df1):
structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("red", "ruby"), class = "factor"),
V2 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L), .Label = c("blue", "violet"), class = "factor"),
V3 = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 2L, 2L), .Label = c("green", "turqoise"), class = "factor"),
V4 = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L), .Label = c("black", "yellow"), class = "factor")), .Names = c("V1",
"V2", "V3", "V4"), class = "data.frame", row.names = c(NA, -16L
))
In this dataframe (df1) the first two columns (V1 and V2) correspond to "rose", and the last two columns (V3 and V4) correspond to "tulip". For example, a respondent could be shown combination 1 from the first row of df1, which is "red blue green yellow". This means that the respondent could choose a "rose that is red and blue" or a "tulip that is green and yellow".
The choices made by respondents are contained in a separate dataframe (df2). df2 has one column per every single combination of colors. If respondent 1 was shown the first combination from df1 ("red blue green yellow") and selected a tulip (that is green and yellow), the choice is marked with "2" (for tulip, i.e. second flower) in the first row of df2. If respondent 2 was shown the second combination from df1 ("red blue green black") and selected a rose (that is red and blue), the choice is marked with "1" (for rose, i.e. first flower) in the second row of df2. In other words, "2" means "tulip chosen, rose not chosen", and 1" means "rose chosen, tulip not chosen".
Dataframe 2 (df2):
structure(list(respondentID = 1:16, v1 = c(2L, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v2 = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v3 = c(NA,
NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA),
v4 = c(NA, NA, NA, 2L, NA, NA, NA, NA, NA, NA, 1L, 2L, NA,
NA, NA, NA), v5 = c(NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA), v6 = c(NA, 2L, NA, NA, NA, NA, NA,
NA, NA, 1L, NA, NA, NA, NA, NA, NA), v7 = c(NA, NA, NA, NA,
1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v8 = c(NA,
NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), v9 = c(NA, NA, NA, NA, NA, NA, NA, 2L, NA, NA, NA, NA,
NA, NA, NA, NA), v10 = c(NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA), v11 = c(NA, NA, NA, NA,
NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA), v12 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA
), v13 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 1L, NA, NA), v14 = c(NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA), v15 = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v16 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2L
)), .Names = c("respondentID", "v1", "v2", "v3", "v4", "v5",
"v6", "v7", "v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15",
"v16"), class = "data.frame", row.names = c(NA, -16L))
If I wanted only to know which flower was chosen and the colors, I could do it using:
df1_with_id <- df1 %>%
setNames(paste0("color", 1:4)) %>%
mutate(combo = paste0("v", row_number()))
result_df <- df2 %>%
gather(key = combo, value = val, -respondentID) %>%
filter(!is.na(val)) %>%
left_join(df1_with_id, by = "combo") %>%
arrange(respondentID)
But this doesn't give me the format I need. I need information on both options (i.e. "rose that is V1 and V2" and "tulip that is V3 and V4") shown to each respondent in separate rows and an additional variable that indicates choice between the two options, like this: Desired result
(In the image, "1" in the choice variable refers to option chosen by respondent, and "0" is the option not chosen.)
I can't quite figure out how to write code to organize data in this way. Any advice?