3
votes

Let's say I have a data.frame/table which I want to "expand", such that each row transforms into a sequence of repeat rows (which can be described as a group), except for a new column which will have the same values in all groups.

So, taking the following dataframe, we want to expand it by column id, such that each row is now repeated, but with a new column q which repeats from 1-4 for each group.

The way I've found to do this is to build a second dataframe based on the first one's column id and with the new column values, and then joining it to the original:

library(dplyr)

set.seed(42)

a <- data.frame(id = 1:5,
                value = runif(5))
a
#>   id     value
#> 1  1 0.9148060
#> 2  2 0.9370754
#> 3  3 0.2861395
#> 4  4 0.8304476
#> 5  5 0.6417455

b <- data.frame(id = rep(a$id, each = 4),
                q = 1:4)

left_join(a, b, by = "id")
#>    id     value q
#> 1   1 0.9148060 1
#> 2   1 0.9148060 2
#> 3   1 0.9148060 3
#> 4   1 0.9148060 4
#> 5   2 0.9370754 1
#> 6   2 0.9370754 2
#> 7   2 0.9370754 3
#> 8   2 0.9370754 4
#> 9   3 0.2861395 1
#> 10  3 0.2861395 2
#> 11  3 0.2861395 3
#> 12  3 0.2861395 4
#> 13  4 0.8304476 1
#> 14  4 0.8304476 2
#> 15  4 0.8304476 3
#> 16  4 0.8304476 4
#> 17  5 0.6417455 1
#> 18  5 0.6417455 2
#> 19  5 0.6417455 3
#> 20  5 0.6417455 4

Created on 2020-01-27 by the reprex package (v0.3.0)

Is there a more direct way of doing this? In my example above I've used dplyr, but I can just as easily adopt data.table syntax if that's easier.

3
If any of the three answers to your question could be used, then you should consider marking one of them as accepted :) - Peter H.

3 Answers

3
votes

If we need to replicate the rows, it can be easily done with uncount

library(dplyr)
library(tidyr)
library(data.table)
uncount(a, 4) %>% 
     mutate(q = rowid(id))

Or another option is create a list column of 'q' and then unnest

a %>% 
   mutate(q = list(1:4)) %>%
   unnest(q)

Or in base R

transform(a[rep(seq_len(nrow(a)), 4), ], q = sequence(table(id)))

Or using data.table

library(data.table)
setDT(a)[, .(q = 1:4),.(id, value)]

Or replicate the rows first

setDT(a)[rep(seq_len(.N), .N)][, q := rep(1:4, length.out = .N)][]
3
votes

This is easily accomplished with the function crossing from the tidyr package.

library(tidyr)

a <- tibble(
  id = 1:5, value = runif(5)
)

crossing(a, q = 1:4)
#> # A tibble: 20 x 3
#>       id value     q
#>    <int> <dbl> <int>
#>  1     1 0.222     1
#>  2     1 0.222     2
#>  3     1 0.222     3
#>  4     1 0.222     4
#>  5     2 0.262     1
#>  6     2 0.262     2
#>  7     2 0.262     3
#>  8     2 0.262     4
#>  9     3 0.284     1
#> # … with 10 more rows

Created on 2020-01-27 by the reprex package (v0.3.0)

2
votes

A data.table alternative:

setDT(a)
a[CJ(id = id, q = 1:4), on = .(id)]

Base R:

expand_grid(a, q = 1:4)