2
votes

So I guess this is possible to achieve by just making a veeery long line code using mutate() and ifelse() but I want to know if there is a way of doing it without writing a tone of code.

I have data where the degree of each individual is written in a non-ordered fashion. The data looks like this:

id <- c(1, 2, 3, 4, 5, 6)
degree1 <- c("masters", "bachelors", "PhD", "bachelors", "bachelors", NA)
degree2 <- c("PhD", "masters", "bachelors", NA, NA, NA)
degree3 <- c("bachelors", NA, "masters", NA, "masters", NA)

Now I want to create a new column containing the string for the highest degree, like this

dat$highest_degree <- c("PhD", "masters", "PhD", "bachelors", "masters", NA)

How can I achieve this?

2

2 Answers

2
votes

An option is to loop over the rows for the selected 'degree' column, convert to factor with levels specified in the order, drop the levels to remove the unused levels and select the first level

v1 <- c("PhD", "masters", "bachelors")
dat$highest_degree <- apply(dat[-1], 1, function(x)  
        levels(droplevels(factor(x, levels = v1)))[1])
dat$highest_degree
#[1] "PhD"       "masters"   "PhD"       "bachelors" "masters"   NA  

Or using tidyverse, reshape into 'long' format, then slice the first row after arrangeing the long format column by matching with an ordered degree vector and grouping by 'id', then join with the original data

library(dplyr)
library(tidyr)
dat %>% 
     pivot_longer(cols = starts_with('degree'), values_to = 'highest_degree') %>%
     select(-name) %>% 
     arrange(id, match(highest_degree, v1)) %>% 
     group_by(id) %>%
     slice_head(n = 1) %>%
     ungroup %>% 
     left_join(dat, .)

data

dat <- data.frame(id, degree1, degree2, degree3)
2
votes

Here is a base R option using pmin + factor

lvs <- c("PhD", "masters", "bachelors")
dat$highest_degree <- lvs[
  do.call(
    pmin,
    c(asplit(matrix(as.integer(factor(as.matrix(dat[-1]), levels = lvs)), nrow(dat)), 2),
      na.rm = TRUE
    )
  )
]

which gives

> dat
  id   degree1   degree2   degree3 highest_degree
1  1   masters       PhD bachelors            PhD
2  2 bachelors   masters      <NA>        masters
3  3       PhD bachelors   masters            PhD
4  4 bachelors      <NA>      <NA>      bachelors
5  5 bachelors      <NA>   masters        masters
6  6      <NA>      <NA>      <NA>           <NA>

Data

> dput(dat)
structure(list(id = c(1, 2, 3, 4, 5, 6), degree1 = c("masters",
"bachelors", "PhD", "bachelors", "bachelors", NA), degree2 = c("PhD",
"masters", "bachelors", NA, NA, NA), degree3 = c("bachelors",
NA, "masters", NA, "masters", NA)), class = "data.frame", row.names = c(NA,
-6L))