create new categorical variable based on min value for each patient

Question

I have a dataset of repeated measures (hb) over time (day) for different patients (record_id). I would like to find the nadir value of hb for each patient, and then use it create a categorical variable that divides the patients into "low nadirhb" (<70), "middle nadirhb" (70-90) and "high nadirhb" (>90). I would be very grateful for your help as I am utterly stuck...

record_id   Day hb  
1   0   122  
1   1   90  
1   2   71  
1   3   71    
2   0   139  
2   1   130  
2   2   119  
2   3   106  
3   0   89  
3   1   126  
3   2   127  
3   3   110  
4   0   90  
4   1   86  
4   2   82  
4   3   78  
5   0   118  
5   1   108  
5   2   95  
5   3   94

I have tried the code below, but I can't merge df and x1:

x1 <- aggregate(hb~record_id, data=df, FUN=function(df) c(min=min(df), count=length(df)))   #this successfully finds the min hb for each patient  
x1<- rename(x1, c("hb" = "nadirhb"))  
x1 <- as.data.frame(x1)  
m=merge(df,x1,by="record_id")  
summary(df$nadirhb)  
#create hb categorical variable  
df$hbcat[df$nadirhb >=90] <- 2  
df$hbcat[df$nadirhb >=70 & df$hb <90] <- 1  
df$hbcat[df$nadirhb <70] <- 0  
table(df$hbcat)

JohnSG JohnSG · Accepted Answer · 2016-04-15T13:58:54

Using dplyr makes this intuitive.

library(dplyr)

# get min value for each record 
df <- df %>%   group_by(record_id) %>%   mutate(min_hb = min(hb))

# create categorical variable dividing patients into segments 
df <- df %>%   mutate(hb_segment = ifelse(min_hb < 70, "low", 
                             ifelse(min_hb < 90, "middle", "high")))

Then select columns and filter to single row per patient

# filter to single row per patient
df_patient <- df %>%
    select(record_id, min_hb, hb_segment) %>%
    distinct()

Result

  record_id min_hb hb_segment
      (int)  (int)      (chr)
1         1     71     middle
2         2    106       high
3         3     89     middle
4         4     78     middle
5         5     94       high

EDIT: as Steven Beaupre pointed out in the comments, you can also do this:

df %>% group_by(record_id) %>% 
    summarise(min_hb = min(hb)) %>% 
    mutate(hb_segment = ifelse(min_hb < 70, "low", ifelse(min_hb < 90, "middle", "high")))

which is a bit shorter

create new categorical variable based on min value for each patient

1 Answers