Find values to the left and right of the last period with regex and separate in dplyr

Question

I have a dataframe with column names like the following:

[127] "quiz.32.player.submitted_answer_private"         "quiz.32.player.rescue_event"                    
[129] "quiz.33.player.solution"                         "quiz.33.player.submitted_answer"                
[131] "quiz.33.player.submitted_answer_private"         "quiz.33.player.rescue_event"                    
[133] "partner_quiz.1.player.solution"                  "partner_quiz.1.player.submitted_answer"         
[135] "partner_quiz.1.player.submitted_answer_private"  "partner_quiz.1.player.rescue_event"             
[137] "partner_quiz.2.player.solution"                  "partner_quiz.2.player.submitted_answer"         
[139] "partner_quiz.2.player.submitted_answer_private"  "partner_quiz.2.player.rescue_event"

I am trying to separate these values by extracting the value to the right of the last period and the value to the left of it. My dplyr pipeline for for this is as follows:

frame <- data %>%
  gather(k, value) %>%
  separate(k, into = c("quiz_number", "suffix"), sep = "\\.(?=player)")

For some reason the resulting data.frame omits all columns that are prefixed with "partner." Any ideas why?

Edit: The resulting split should have in the column quiz_number all the content to the left of the last period (e.g. quiz.32.player and partner_quiz.2.player) and in the "suffix" column, all the content to the right of the last period (e.g. submitted_answer_private and solution)

akrun akrun · Accepted Answer · 2019-07-08T19:43:58

Instead of the 'player' in regex lookaround, do a positive match for characters that are not a . till the end ($) of the string

library(dplyr)
library(tidyr)
data %>%
   gather(k, value) %>%
   separate(k, into = c("quiz_number", "suffix"), sep = "\\.(?=[^.]+$)")

In the OP's code, it is matching . before the 'player' string, but there are .s after the 'player' e.g. quiz.32.player.rescue_event

Find values to the left and right of the last period with regex and separate in dplyr

1 Answers