Here is what I'm trying to do:
Create a new column that assigns a sample rank to multiple subsets of rows based on how many rows there are in each subset. The grouping variable is the 'stratum' column.
I usually randomly assign rank using nested ifelse statements as shown below. Sometimes this suffices, but lately, I've been dealing with more and more groupings. 40 nested ifelse statements can start to look a little excessive.
Is there a more elegant/quicker/minimal code way to do this using dplyr or data.table, maybe in conjunction with apply, lapply, sapply etc.?
I have tried to use data.table statements but i do not know how to insert the sample function using nrow.
Reproducible data:
dta <- data.frame(
uniqueID = c(950513, 951634, 951640, 951641,951646, 952732, 952895, 952909, 952910, 952911, 952912,952923, 952924, 952925, 952926, 952927, 952928L, 952933,
952934, 952935),
stratum = c("group9","group6","group15","group13","group9","group8","group9","group15","group15","group15","group15", "group13", "group13",
"group1", "group1", "group1", "group1", "group1", "group1", "group1")
)
Here is how I usually assign a random rank, using netsed ifelse statement:
dta<- dta[order(dta$stratum),]
set.seed(7265)
dta$rank <- ifelse(dta$stratum== "group1",sample(1:nrow(dta[dta$stratum== "group1",])),
ifelse(dta$stratum=="group6",sample(1:nrow(dta[dta$stratum== "group6",])),
ifelse(dta$stratum=="group8",sample(1:nrow(dta[dta$stratum== "group8",])),
ifelse(dta$stratum=="group9",sample(1:nrow(dta[dta$stratum== "group9",])),
ifelse(dta$stratum=="group13",sample(1:nrow(dta[dta$stratum== "group13",])),
ifelse(dta$stratum=="group15",sample(1:nrow(dta[dta$stratum== "group15",])),
0))))))