I am trying to count the number of unique characters across multiple columns (t1,t2,t3) and place this value into a new variable. Whether or not a character is counted depends on the value of another column it is associate with (p1,p2,p3) being greater than or equal to 0.05. For example. I have the following dataset:
dat <- data.frame(id = c(1,2,3,4,5),t1 = c('a','a','b','b','c'),
p1 = c(0.98,1,0.5,0.9,1),t2 = c('b',NA,'a','c',NA),
p2 = c(0.02,NA,0.25,0.10,NA), t3 = c(NA,NA,'c',NA,NA),
p3 = c(NA,NA,0.25,NA,NA))
I am looking to count the number of unique values present in columns t1, t2, t3 for a given row, and put this number into the new variable (total) which should have an output like so:
output <- data.frame(id = c(1,2,3,4,5),t1 = c('a','a','b','b','c'),
p1 = c(0.98,1,0.5,0.9,1),t2 = c('b',NA,'a','c',NA),
p2 = c(0.02,NA,0.25,0.10,NA), t3 = c(NA,NA,'c',NA,NA),
p3 = c(NA,NA,0.25,NA,NA), total = c(1,1,3,2,1))
Using dplyr I am able to count unique characters in t1,t2,and t3 with this code:
output <- dat %>%
group_by(id) %>%
mutate(total = n_distinct(c(t1,t2,t3), na.rm = TRUE))
However, I am unable to set the conditions where p1, p2, and p3 must be >= 0.05 if t1,t2,or t3, respectively will be counted to achieve the desired output. Is there a way to set this condition for each column t1, t2, t3? Thank you for your help.