3
votes

I have a dataframe with column names like the following:

[127] "quiz.32.player.submitted_answer_private"         "quiz.32.player.rescue_event"                    
[129] "quiz.33.player.solution"                         "quiz.33.player.submitted_answer"                
[131] "quiz.33.player.submitted_answer_private"         "quiz.33.player.rescue_event"                    
[133] "partner_quiz.1.player.solution"                  "partner_quiz.1.player.submitted_answer"         
[135] "partner_quiz.1.player.submitted_answer_private"  "partner_quiz.1.player.rescue_event"             
[137] "partner_quiz.2.player.solution"                  "partner_quiz.2.player.submitted_answer"         
[139] "partner_quiz.2.player.submitted_answer_private"  "partner_quiz.2.player.rescue_event"      

I am trying to separate these values by extracting the value to the right of the last period and the value to the left of it. My dplyr pipeline for for this is as follows:

frame <- data %>%
  gather(k, value) %>%
  separate(k, into = c("quiz_number", "suffix"), sep = "\\.(?=player)")

For some reason the resulting data.frame omits all columns that are prefixed with "partner." Any ideas why?

Edit: The resulting split should have in the column quiz_number all the content to the left of the last period (e.g. quiz.32.player and partner_quiz.2.player) and in the "suffix" column, all the content to the right of the last period (e.g. submitted_answer_private and solution)

1

1 Answers

5
votes

Instead of the 'player' in regex lookaround, do a positive match for characters that are not a . till the end ($) of the string

library(dplyr)
library(tidyr)
data %>%
   gather(k, value) %>%
   separate(k, into = c("quiz_number", "suffix"), sep = "\\.(?=[^.]+$)")

In the OP's code, it is matching . before the 'player' string, but there are .s after the 'player' e.g. quiz.32.player.rescue_event