0
votes

I have a data set of birth defects (test), in which each row is a case, with a different 5-way combination of defects. The first five columns of the data set (Defect_A, Defect_B, Defect_C, Defect_D, Defect_E) are the defect numbers that make up this combination.

I want to create a new column called “comments” that outputs a comment based on the following conditional logic:

  1. If a case/row has any of the following defects (1, 2, 3, 4) in columns 1:5, comments = “conjoined”
  2. Ifelse a case has any TWO of the following defects (5, 6, 7, 8) in columns 1:5, comments = “spina bifida”
  3. Ifelse a case has any one of the following defects (5, 6, 7, 8) AND one of the following defects (9,10,11,12,13) in columns 1:5, comments = “heterodaxy”
  4. Ifelse a case has any THREE of the following defects (14,15,16,17,18) in columns 1:5, comments = “vacterl”
       Defect_A Defect_B Defect_C Defect_D Defect_E
case1        12        3       13       17        9
case2        20       13        6        7        3
case3        11       10        4       20       12
case4        13        7        2       18        3
case5         5        2       15       11       13
case6         8        1       15       19        4
case7        11        7       19       10        1
case8         9       14       15       11       16
case9        18       10       14       16        8
case10       19        7        8       10        2

How would I go about doing this? I’ve included sample code below.

[edit]

# Sample data set 
set.seed(99)
case1 = sample(1:20, 5, replace=FALSE)  
case2 = sample(1:20, 5, replace=FALSE)  
case3 = sample(1:20, 5, replace=FALSE)  
case4 = sample(1:20, 5, replace=FALSE)  
case5 = sample(1:20, 5, replace=FALSE)  
case6 = sample(1:20, 5, replace=FALSE)  
case7 = sample(1:20, 5, replace=FALSE)  
case8 = sample(1:20, 5, replace=FALSE)  
case9 = sample(1:20, 5, replace=FALSE)  
case10 = sample(1:20, 5, replace=FALSE) 
test<-data.frame(rbind(case1, case2, case3, case4, case5, case6, case7, case8, case9, case10))
colnames(test)<- c("Defect_A", "Defect_B", "Defect_C", "Defect_D", "Defect_E")
test

# Conditions
any <- c(1,2,3,4) # for condition 1  
any_2 <- c(5,6,7,8) # for conditions 2 and 3  
any_2_plus <- c(9,10,11,12,13) # for condition 3  
any_3 <- c(14,15,16,17,18) # for condition 4  
1
Please specify a seed with set.seedakrun

1 Answers

1
votes

With this dataframe :

# Sample data set
df = data.frame(Defect_A = sample(1:30, 10, replace=TRUE),
                Defect_B = sample(1:30, 10, replace=TRUE),
                Defect_C = sample(1:30, 10, replace=TRUE), 
                Defect_D = sample(1:30, 10, replace=TRUE),
                Defect_E = sample(1:30, 10, replace=TRUE))

# Conditions
any <- c(1,2,3,4) # for condition 1  
any_2 <- c(5,6,7,8) # for conditions 2 and 3  
any_2_plus <- c(9,10,11,12,13) # for condition 3  
any_3 <- c(14,15,16,17,18) # for condition 4  

You may use several ifelse

df$comments = apply(df,1, function(x) {
   ifelse(length(x[x %in% any == TRUE]) >= 1, 'conjoined', ifelse (
     length(x[x %in% any_2 == TRUE]) >= 2, 'spina bifida', ifelse (
       length(x[x %in% any_2 == TRUE]) >= 1 && length(x[x %in% any_2_plus == TRUE]) >= 1, 'heterodaxy', ifelse (
         length(x[x %in% any_3 == TRUE]) >= 3, 'vacterl', 'NA'))))
})

Conditions to adapt if needed