0
votes

My goal is to plot the structure of a questionnaire. For every respondent I have a value giving the question answered plus the answer time: q1:2,q2:3,q4:4,q10:4 means that the corresponding respondent answered question q1 first (in 2 seconds), then he answered q2 (in 3 seconds) and so forth until question q10. Sometimes the names of the questions also start with "d" (e.g. d10) which is just another type of question.

Example:

dat <- data.frame(path = c(
  "q1:9,q2:8,d3:10,q10:3,q4:10",
  "q1:10,q2:10,q10:2,q4:2",
  "q1:2,q2:3,d11:2"
))

My idea is to plot the structure of how the questionnaire was answered as a network-graph. So I need a long list of all the different "steps" the respondents made within the questionnaire:

from | to
------------
q1     q2 # first respondent
q2     d3
d3     q10
q10    q4
q1     q2 # second respondent
q2     q10
q10    q4
q1     q2 # third respondent
q2     d11
...

My problem is that due to filter-questions not all the respondents had to answer the same number of questions (so I can't use separate(, into=?) because ? is variable). Furthermore the variables need to be split up "pairwise".

Does someone have an idea how to get the above dataframe??

The final goal would, of course, be to have a table including the number of respondents for each "step" (e.g. 20 respondents went from q1 to q2, so 20 can be used as a weighting variable in the graph).

Thanks!

3

3 Answers

1
votes

Do you mean something like below?

do.call(
  rbind,
  lapply(
    strsplit(dat$path, ":\\d+,?"),
    function(v) data.frame(from = v[-length(v)], to = v[-1])
  )
)

which gives

  from  to
1   q1  q2
2   q2  d3
3   d3 q10
4  q10  q4
5   q1  q2
6   q2 q10
7  q10  q4
8   q1  q2
9   q2 d11
1
votes

An option with str_extract

library(dplyr)
library(tidyr)
library(stringr)
library(purrr)
dat %>% 
    transmute(from = str_extract_all(path, "\\w+(?=:)"), to = map(from, lead)) %>%
    unnest(c(from, to)) %>%
    filter(!is.na(to))

-output

# A tibble: 9 x 2
#  from  to   
#  <chr> <chr>
#1 q1    q2   
#2 q2    d3   
#3 d3    q10  
#4 q10   q4   
#5 q1    q2   
#6 q2    q10  
#7 q10   q4   
#8 q1    q2   
#9 q2    d11  
0
votes

Does this answer:

> dat %>% mutate(ID = row_number()) %>%  
+           separate_rows(path,sep = ',') %>% 
+                   extract(col = path, into = c('ques','number'), regex = '(.*):(.*)') %>% 
+                                   select(-number) %>% group_by(ID) %>% mutate(from = ques, to = lead(ques)) %>% ungroup() %>% select(c(-ques,-ID)) %>% na.omit()
# A tibble: 9 x 2
  from  to   
  <chr> <chr>
1 q1    q2   
2 q2    d3   
3 d3    q10  
4 q10   q4   
5 q1    q2   
6 q2    q10  
7 q10   q4   
8 q1    q2   
9 q2    d11  
>