Brief Dataset description: I have survey data generated from Qualtrics, which I've imported into R as a tibble. Each column corresponds to a survey question, and I've preserved the original column order (to correspond with the order of the questions in the survey).
Problem in plain language: Due to normal participant attrition, not all participants completed all of the questions in the survey. I want to know how far each participant got in the survey, and the last question they each answered before stopping.
Problem statement in R: I want to generate (using tidyverse):
- 1) A new column (lastq) that lists, for each row (i.e. for each participant), the name of the last non-NA column (i.e. the name of the last question they completed).
- 2) A second new column that lists the number of the column in lastq
Sample dataframe df
df <- tibble(
year = c(2015, 2015, 2016, 2016),
grade = c(1, NA, 1, NA),
height = c("short", "tall", NA, NA),
gender = c(NA, "m", NA, "f")
)
Original df
# A tibble: 4 x 4
year grade height gender
<dbl> <dbl> <chr> <chr>
1 2015 1 short <NA>
2 2015 NA tall m
3 2016 1 <NA> <NA>
4 2016 NA <NA> f
Desired final df
# A tibble: 4 x 6
year grade height gender lastq lastqnum
<dbl> <dbl> <chr> <chr> <chr> <dbl>
1 2015 1 short <NA> height 3
2 2015 NA tall m gender 4
3 2016 1 <NA> <NA> grade 2
4 2016 NA <NA> f gender 4
There are some other related questions, but I can't seem to find any answers focused on extracting the column names (vs. the values themselves) based on a tibble of mixed variable classes (vs. all numeric), using a tidyverse solution
What I've been trying - I know there's something I'm missing here... :
ds %>% map(which(!is.na(.)))
ds %>% map(tail(!is.na(.), 2))
ds %>% rowwise() %>% mutate(last = which(!is.na(ds)))
?
Thank you so much for your help!
max(which(!is.na(ds)))
? – James