2
votes

I would be grateful for help with this little easter egg that I can solve with a function but would prefer a prettier and faster data.table-based solution.

Some generated data:

library(data.table)

set.seed(123)  
my.df <- data.table(
    ID = rep(letters[1:2], each = 6),
    factor1 = rep(letters[9:11], each = 2), 
    col1 = sample(1:30, 12, replace = TRUE))

# ID factor1 col1
# 1:  a       i    8
# 2:  a       i    9
# 3:  a       j   15
# 4:  a       j   26
# 5:  a       k   13
# 6:  a       k    2
# 7:  b       i    7
# 8:  b       i    9
# 9:  b       j    7
# 10:  b       j    4
# 11:  b       k   29
# 12:  b       k   23

My problem: I would like to create a new column, col2, with the first two rows of data in col1 where factor1 == i and repeat the numbers as long as ID==a and then the same for ID==b. Any help would be appreciated, data.table solutions are preferred

The new col2 should look like this:

set.seed(123)  
my.df2 <- data.table(
    ID = rep(letters[1:2], each = 6),
    factor1 = rep(letters[9:11], each = 2), 
    col1 = sample(1:30, 12, replace = TRUE),
    col2 = c(12,13,12,13,12,13,4,10,4,10,4,10))

# ID factor1 col1 col2
# 1:  a       i   23   15
# 2:  a       i   21   19
# 3:  a       j   14   15
# 4:  a       j   17   19
# 5:  a       k   17   15
# 6:  a       k   13   19
# 7:  b       i    8   22
# 8:  b       i    1   11
# 9:  b       j   19   22
# 10:  b       j   14   11
# 11:  b       k   19   22
# 12:  b       k   20   11

This was solved with the below comment code:

my.df[, col2 := rep(head(col1[factor1 == 'i'], 2), length.out = .N), ID]

Another solution, as I always have the same number of groups in factor1:

my.df[, col2 := rep(head(col1[factor1 == 'i'], 2), times = 3), ID]

As I have 2 factors to group by I didn't show in this example I changed the ending:

my.df[, col2 := rep(head(col1[factor1 == 'i'], 2), times = 3), by=list(ID1, ID2)]
1
Please set the seed to make your data reproducible: set.seed(1) - zx8754

1 Answers

3
votes

You should have used set.seed while creating data but based on your description I think this is what you meant.

library(data.table)
my.df[, col2 := rep(col1[factor1 == 'i'], length.out = .N), ID]

If there are more than 2 values in a particular ID where factor1 == 'i' but you are interested only in first 2 we can use head.

my.df[, col2 := rep(head(col1[factor1 == 'i'], 2), length.out = .N), ID]


my.df
#    ID factor1 col1 col2
# 1:  a       i   15   15
# 2:  a       i   19   19
# 3:  a       j   14   15
# 4:  a       j    3   19
# 5:  a       k   10   15
# 6:  a       k   18   19
# 7:  b       i   22   22
# 8:  b       i   11   11
# 9:  b       j    5   22
#10:  b       j   20   11
#11:  b       k   14   22
#12:  b       k   22   11

data

set.seed(123)
my.df <- data.table(
        ID = rep(letters[1:2], each = 6),
        factor1 = rep(letters[9:11], each = 2), 
        col1 = sample(1:30, 12, replace = TRUE))