Randomly separate one column into two groups based on ID in R

Question

I have a data frame that looks like this. For each ID, I want to randomly assign subjects into two groups with relatively equal subjects, and I also want to add a new column that indicates which group they're in. For example, For ID 1, 101 and 103 are assigned into Group A, 102 and 104 are in Group B; for ID 2, 105 and 106 are in Group A, 107 is in Group B. And I have thousands of IDs and subjects, how can I manage to do this?

   ID subject
1  1     101
2  1     102
3  1     103
4  1     104
4  2     105
5  2     106
6  2     107

Ronak Shah Ronak Shah · Accepted Answer · 2021-01-06T04:21:50

For each ID you can sample values that you want to repeat with replace = TRUE where each value has an equal probability of occurring.

library(dplyr)
groups <- c('Group A', 'Group B')

df %>%
  group_by(ID) %>%
  mutate(group = sample(groups, n(), replace = TRUE)) -> result

Note that the above is completely random and it is possible that one ID with 4 rows have 3 rows with Group A and 1 with Group B. If you want that both the groups are always equal distributed you can use rep and sample them for randomness.

df %>%
  group_by(ID) %>%
  mutate(group = sample(rep(groups, length.out = n()))) -> result

Randomly separate one column into two groups based on ID in R

2 Answers