R: Splitting up strings “pairwise” into a variable number of elements

Question

My goal is to plot the structure of a questionnaire. For every respondent I have a value giving the question answered plus the answer time: q1:2,q2:3,q4:4,q10:4 means that the corresponding respondent answered question q1 first (in 2 seconds), then he answered q2 (in 3 seconds) and so forth until question q10. Sometimes the names of the questions also start with "d" (e.g. d10) which is just another type of question.

Example:

dat <- data.frame(path = c(
  "q1:9,q2:8,d3:10,q10:3,q4:10",
  "q1:10,q2:10,q10:2,q4:2",
  "q1:2,q2:3,d11:2"
))

My idea is to plot the structure of how the questionnaire was answered as a network-graph. So I need a long list of all the different "steps" the respondents made within the questionnaire:

from | to
------------
q1     q2 # first respondent
q2     d3
d3     q10
q10    q4
q1     q2 # second respondent
q2     q10
q10    q4
q1     q2 # third respondent
q2     d11
...

My problem is that due to filter-questions not all the respondents had to answer the same number of questions (so I can't use separate(, into=?) because ? is variable). Furthermore the variables need to be split up "pairwise".

Does someone have an idea how to get the above dataframe??

The final goal would, of course, be to have a table including the number of respondents for each "step" (e.g. 20 respondents went from q1 to q2, so 20 can be used as a weighting variable in the graph).

Thanks!

ThomasIsCoding ThomasIsCoding · Accepted Answer · 2020-10-10T14:13:32

Do you mean something like below?

do.call(
  rbind,
  lapply(
    strsplit(dat$path, ":\\d+,?"),
    function(v) data.frame(from = v[-length(v)], to = v[-1])
  )
)

which gives

  from  to
1   q1  q2
2   q2  d3
3   d3 q10
4  q10  q4
5   q1  q2
6   q2 q10
7  q10  q4
8   q1  q2
9   q2 d11

R: Splitting up strings “pairwise” into a variable number of elements

3 Answers