1
votes

I use tidyverse in R-Studio and have a data frame (df), consisting of multiple variables and observations from patients.
There are columns, containing string variables of 7 different symptoms. These columns contain NAs as well. Some observations have multiple positive variables. Here are the first 10 rows and 4 columns of the table:

symptom_1      symptom_2      symptom_3      symptom_4
1       <NA>           <NA> SYMPTOM'S NAME SYMPTOM'S NAME
2       <NA> SYMPTOM'S NAME           <NA> SYMPTOM'S NAME
3       <NA>           <NA>           <NA>           <NA>
4       <NA>           <NA>           <NA>           <NA>
5       <NA>           <NA>           <NA>           <NA>
6       <NA>           <NA>           <NA>           <NA>
7       <NA>           <NA>           <NA>           <NA>
8       <NA>           <NA>           <NA>           <NA>
9       <NA>           <NA>           <NA>           <NA>
10      <NA>           <NA>           <NA>           <NA>

I would like to build a new factor column, containing "Positive" for those observations which have at least 1 of the variables (symptoms), and "NA"s for those cases containing "NA"s for all symptoms. I.e. column should contain "Positive" for cases 1 and 2 and "NA" for cases from 3 to 10. I've searched for the solution in current resource, have tried different suggestions and the closest to my expectations came the result which looks as follows:

df<-
df %>% 
select(symptom_1:symptom_7) %>% 
mutate_if(is.character, funs(any_positive=ifelse(!is.na(.), "Positive", .)))

But this code resulted in 14 more columns, named as "symptom_1_any_positive", "symptom_2_any_positive", "symptom_3_any_positive" and so on, but not the single one. How can I solve this problem and mutate variables into only one column?

Thank you in advance.

1
Thank you. Done.Jakhongir Alidjanov
Also, I posted a solutionakrun
Thanks a lot. Do you have some ideas about the usage of dplyr/tidyverse commands? The names of columns are not real names, I have just reproduced the draft table, so grepl and names won't work in this situation.Jakhongir Alidjanov

1 Answers

1
votes

We can use rowSums on a logical matrix

nm1 <- grep("^symptom_\\d+$", names(df))
df$newcol <- c(NA, "Positive")[(rowSums(!is.na(df[nm1])) > 0) + 1]

Of if there are negative values as well and want to check for positive values

df$newcol <- c(NA, "Positive")[(rowSums(df[nm1] > 0 & !is.na(df[nm1])) >0) + 1]