I have a dataset were each participant has multiple observations (column name=id). In each observation each participant is given a diagnosis (column name=diagnosis).
I would like to count the number of participants who have a specific combination of diagnoses.
Please find a reproducible example of R code below. I have tried to group the data by id, filter by the combination of two diagnoses, and then count the number of participants, but this returns no data.
Do you see any solutions?
Thank you!
library(tidyverse)
id <- c(1,1,1,2,2,2,3,3,3)
diagnosis <- c("a101", "b101", "a101",
"c101", "c101", "c101",
"b101", "a101", "b101")
data <- data.frame(id, diagnosis, stringsAsFactors = FALSE)
n_a101_and_b101 <- data %>%
group_by(id) %>%
filter((substr(diagnosis,1,4)=="a101") &
(substr(diagnosis,1,4)=="b101")) %>%
tally()
n_a101_and_b101
&
needs to be|
.(substr(diagnosis, 1,4)=="a101") | (substr(diagnosis,1,4)=="b101")
. diagnosis can not bea101
andb101
at the same time. – ricoderkssubstr
as you are trying to spot exact matches. – AntoniosK