I'm trying to classify participants' current status in a course. This is an extension of this post: purrr pmap to read max column name by column name number. My dataset looks like this:
library(dplyr)
problem <- tibble(name = c("Angela", "Claire", "Justin", "Bob", "Joseph", "Gil"),
status_1 = c("Registered", "Withdrawn", "Completed", "Registered", "Registered", "Registered"),
status_2 = c("Withdrawn", "Withdrawn", "Registered", "NA", "Withdrawn", "Cancelled"),
status_3 = c("NA", "Registered", "Withdrawn", "NA", "Registered", "NA"),
status_4 = c("Withdrawn", "Registered", "Withdrawn", "NA", "Registered", "NA"))
I want to classify people's current status. If someone has completed the course at any status, their status is "Completed." However, what's tricky is their registered status. Someone is "Registered" IF their final status is registered OR if the later status is "NA". They are NOT registered if a status after their registration is withdrawn or cancelled. So, the final dataset should look this:
library(dplyr)
solution <- tibble(name = c("Angela", "Claire", "Justin", "Bob", "Joseph", "Gil"),
status_1 = c("Registered", "Withdrawn", "Completed", "Registered", "Registered", "Registered"),
status_2 = c("Withdrawn", "Withdrawn", "Registered", "NA", "Withdrawn", "Cancelled"),
status_3 = c("NA", "Registered", "Withdrawn", "NA", "Registered", "NA"),
status_4 = c("Withdrawn", "Registered", "Withdrawn", "NA", "Registered", "NA"),
current = c("Not Taken", "Registered", "Completed", "Registered", "Registered", "Not Taken"))
Angela is not taken because she withdrew after her registration. Claire is registered because, despite her past withdrawals, she more recently registered. Justin is completed because he completed the course at any status. Bob is registered because he has not withdrawn or had the course cancelled. Similar to Claire, Joseph has registered more recently than his withdrawal, so he is registered. Finally, Gil is "Not Taken" because his course was canceled, and he doesn't have a more recent registration.
Here's my code:
library(tidyverse)
solution %>%
mutate(
test =
pmap_chr(select(., contains("status")), ~
case_when(
any(str_detect(c(...), "(?i)Completed")) ~ "Completed",
any(str_detect(c(...), "(?i)Exempt")) | any(str_detect(c(...), "(?i)Incomplete")) ~ "Exclude",
length(c(...) == "Registered") > length(c(...) == "Withdrawn") | length(c(...) == "Registered") > length(c(...) == "Cancelled") ~ "Registered",
any(str_detect(c(...), "(?i)No Show")) | any(str_detect(c(...), "(?i)Denied")) | any(str_detect(c(...), "(?i)Cancelled")) | any(str_detect(c(...), "(?i)Waitlist Expired")) || any(str_detect(c(...), "(?i)Withdrawn")) ~ "Not Taken",
TRUE ~ "NA"
)
)
)
I can't figure out how to crack the code with the registration portion. Ideally, I'd like to retain as much of this code as possible because my true dataset has many columns of status.
Thank you!