0
votes

I have a survey question in which respondents could select multiple answers (for 16 possible combinations, e.g. "Which color do you like?" can result in responses "red, blue, green, yellow" or "red, blue, green, black" etc.

These 16 possible combinations are contained in a spreadsheet:

Image 1: First two rows of the spreadsheet (full spreadsheet has 16 rows)

Example 1:

structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("red", "ruby"), class = "factor"), 
V2 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L), .Label = c("blue", "violet"), class = "factor"), 
V3 = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 
2L, 2L, 1L, 1L, 2L, 2L), .Label = c("green", "turqoise"), class = "factor"), 
V4 = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L), .Label = c("black", "yellow"), class = "factor")), .Names = c("V1", 
"V2", "V3", "V4"), class = "data.frame", row.names = c(NA, -16L
))

The dataframe with responses has sixteen columns for this question (one column per every simple combination of colors). If respondent 1 selected the first combination, only the first column contains data; similarly, if respondent 2 selected the second combination, the second column contains data. The other are empty:

Image 2: The first two columns of the dataframe

Example 2:

structure(list(respondentID = 1:16, v1 = c(1L, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v2 = c(NA, 1L, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v3 = c(NA, 
NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
v4 = c(NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, 1L, 1L, NA, 
NA, NA, NA), v5 = c(NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA), v6 = c(NA, 1L, NA, NA, NA, NA, NA, 
NA, NA, 1L, NA, NA, NA, NA, NA, NA), v7 = c(NA, NA, NA, NA, 
1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v8 = c(NA, 
NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), v9 = c(NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, 
NA, NA, NA, NA), v10 = c(NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA), v11 = c(NA, NA, NA, NA, 
NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA), v12 = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA
), v13 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, 1L, NA, NA), v14 = c(NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA), v15 = c(NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v16 = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L
)), .Names = c("respondentID", "v1", "v2", "v3", "v4", "v5", 
"v6", "v7", "v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15", 
"v16"), class = "data.frame", row.names = c(NA, -16L))

(Of course, in practice respondent 1 didn't necessarily choose combination 1).

All the information in the dataframe is the number "1", which corresponds to appropriate combination in the spreadsheet.

In order to analyze responses to the question, I need to extract the combination from the spreadsheet and import it into the dataframe with responses, so that I get four new columns in the dataframe with the combination of colors chosen by a respondent (e.g. red, blue, green, yellow for respondent 1).

I don't think there's any way to do this using apply, so I guess I need to write a for loop to extract and import the data. Any advice on how to do this?

2
When asking for help, you should include a simple reproducible example with sample input and desired output that can be used to test and verify possible solutions. Pictures of data aren't very helpful.MrFlick
The second data frame you posted has 9 columns plus the IDs, not 16camille

2 Answers

1
votes

If you put the second data frame into a long shape, you can filter for just the combinations each person chose, and then join the second data frame with the first. The two data frames have combination labels that can be reconciled between the two to join on.

Note that I changed the column names in the first data frame, df1_with_id, to be color1, etc, only because otherwise you would have v1, v2, ... in one data frame, and V1, V2, ... representing something different in the other. Not a necessary change, but it's good to keep from confusing what different variables mean.

library(tidyverse)

df1 <- structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("red", "ruby"), class = "factor"),V2 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L,1L, 1L, 2L, 2L, 2L, 2L), .Label = c("blue", "violet"), class = "factor"),V3 = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L,2L, 2L, 1L, 1L, 2L, 2L), .Label = c("green", "turqoise"), class = "factor"),V4 = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,2L, 1L, 2L, 1L, 2L, 1L), .Label = c("black", "yellow"), class = "factor")), .Names = c("V1","V2", "V3", "V4"), class = "data.frame", row.names = c(NA, -16L))

df2 <- structure(list(respondentID = 1:16, v1 = c(1L, NA, NA, NA, NA,NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v2 = c(NA, 1L, NA,NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v3 = c(NA,NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA),v4 = c(NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, 1L, 1L, NA,NA, NA, NA), v5 = c(NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA,NA, NA, NA, NA, NA, NA), v6 = c(NA, 1L, NA, NA, NA, NA, NA,NA, NA, 1L, NA, NA, NA, NA, NA, NA), v7 = c(NA, NA, NA, NA,1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v8 = c(NA,NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v9 = c(NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA,NA, NA, NA, NA), v10 = c(NA, NA, NA, NA, NA, NA, NA, NA,NA, NA, NA, NA, NA, NA, NA, NA), v11 = c(NA, NA, NA, NA,NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA), v12 = c(NA,NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA, NA), v13 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,NA, 1L, NA, NA), v14 = c(NA, NA, NA, NA, NA, NA, NA, NA,NA, NA, NA, NA, NA, NA, NA, NA), v15 = c(NA, NA, NA, NA,NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), v16 = c(NA,NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1L)), .Names = c("respondentID", "v1", "v2", "v3", "v4", "v5","v6", "v7", "v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15","v16"), class = "data.frame", row.names = c(NA, -16L))

df1_with_id <- df1 %>% 
    setNames(paste0("color", 1:4)) %>%
    mutate(combo = paste0("v", row_number()))

head(df1_with_id)
#>   color1 color2   color3 color4 combo
#> 1    red   blue    green yellow    v1
#> 2    red   blue    green  black    v2
#> 3    red   blue turqoise yellow    v3
#> 4    red   blue turqoise  black    v4
#> 5    red violet    green yellow    v5
#> 6    red violet    green  black    v6

df2 %>%
    gather(key = combo, value = val, -respondentID) %>%
    filter(!is.na(val)) %>%
    left_join(df1_with_id, by = "combo")
#>    respondentID combo val color1 color2   color3 color4
#> 1             1    v1   1    red   blue    green yellow
#> 2             2    v2   1    red   blue    green  black
#> 3             7    v3   1    red   blue turqoise yellow
#> 4             4    v4   1    red   blue turqoise  black
#> 5            11    v4   1    red   blue turqoise  black
#> 6            12    v4   1    red   blue turqoise  black
#> 7             3    v5   1    red violet    green yellow
#> 8             2    v6   1    red violet    green  black
#> 9            10    v6   1    red violet    green  black
#> 10            5    v7   1    red violet turqoise yellow
#> 11            6    v8   1    red violet turqoise  black
#> 12            8    v9   1   ruby   blue    green yellow
#> 13            9   v11   1   ruby   blue turqoise yellow
#> 14           13   v12   1   ruby   blue turqoise  black
#> 15           14   v13   1   ruby violet    green yellow
#> 16           16   v16   1   ruby violet turqoise  black

Created on 2018-05-08 by the reprex package (v0.2.0).

0
votes

I'm not sure to understand what you want to do. Using the tidyverse packages and the melt() function of the reshape2 package, you might try

df_respondent_combination <-
  df_respondent %>%
  melt(measure.vars = c(2:ncol(.)), na.rm = T) %>%
  cbind(df_combination) %>%
  select(-variable, -value) %>%
  arrange(respondentID)

With df_respondent_combination the new dataframe expected, df_respondent your example 2 and df_combination your example 1.