Elegant way to do nested if else statements for multiple groups

Question

Here is what I'm trying to do:

Create a new column that assigns a sample rank to multiple subsets of rows based on how many rows there are in each subset. The grouping variable is the 'stratum' column.

I usually randomly assign rank using nested ifelse statements as shown below. Sometimes this suffices, but lately, I've been dealing with more and more groupings. 40 nested ifelse statements can start to look a little excessive.

Is there a more elegant/quicker/minimal code way to do this using dplyr or data.table, maybe in conjunction with apply, lapply, sapply etc.?

I have tried to use data.table statements but i do not know how to insert the sample function using nrow.

Reproducible data:

dta <- data.frame(
     uniqueID = c(950513, 951634, 951640, 951641,951646, 952732, 952895, 952909, 952910, 952911, 952912,952923, 952924, 952925, 952926, 952927, 952928L, 952933, 
           952934, 952935),
     stratum = c("group9","group6","group15","group13","group9","group8","group9","group15","group15","group15","group15", "group13", "group13", 
          "group1", "group1", "group1", "group1", "group1", "group1", "group1")
)

Here is how I usually assign a random rank, using netsed ifelse statement:

dta<- dta[order(dta$stratum),]  
set.seed(7265)                                                                                                                 

dta$rank <- ifelse(dta$stratum== "group1",sample(1:nrow(dta[dta$stratum== "group1",])),
               ifelse(dta$stratum=="group6",sample(1:nrow(dta[dta$stratum== "group6",])),
                      ifelse(dta$stratum=="group8",sample(1:nrow(dta[dta$stratum== "group8",])),
                             ifelse(dta$stratum=="group9",sample(1:nrow(dta[dta$stratum== "group9",])),
                                    ifelse(dta$stratum=="group13",sample(1:nrow(dta[dta$stratum== "group13",])),
                                           ifelse(dta$stratum=="group15",sample(1:nrow(dta[dta$stratum== "group15",])),
                                                  0))))))

MrFlick MrFlick · Accepted Answer · 2017-12-27T19:17:44

Using dplyr, you can do

library(dplyr)
dta %>% 
    group_by(stratum) %>% 
    mutate(rank=sample.int(n()))

The group_by allows you to operate on a subset of rows at a time and we use the built in n() function from dplyr to get the number of rows in each group. I chose to use the more efficient sample.int rather than sample but it basically does the same thing.

In general, nested if-else statements are better handled with case_when() in dplyr, but what you were doing in this case is better handled with a group_by()

Elegant way to do nested if else statements for multiple groups

3 Answers