Here is a solution using dplyr
, tidyr
, and purrr
.
library(dplyr)
library(tidyr)
library(purrr)
ucb_admit2 <- ucb_admit %>%
mutate(Freq = map2(1, Freq, `:`)) %>%
unnest() %>%
select(-Freq)
Or use this similar approach, which only needs functions from dplyr
and tidyr
.
ucb_admit2 <- ucb_admit %>%
rowwise() %>%
mutate(Freq = list(seq(1, Freq))) %>%
ungroup() %>%
unnest() %>%
select(-Freq)
Both of them adopt the same strategy: create a list column and then unnest
it.
We can also consider use the separate_row
function from tidyr
to achieve this task.
ucb_admit2 <- ucb_admit %>%
rowwise() %>%
mutate(Freq = paste(seq(1, Freq), collapse = ",")) %>%
ungroup() %>%
separate_rows(Freq) %>%
select(-Freq)
Benchmarking
I compared the two methods proposed by eipi10 and the three methods proposed by me, using the following microbenchmarking
. The result shows that base R approach is the fastest, followed by the dplyr
repeat and slice approach. So, I think unless there are other considerations, such as code readability, no need to use tidyr
or purrr
for this question.
library(microbenchmark)
library(microbenchmark)
microbenchmark(m1 = (ucb_admit[rep(1:nrow(ucb_admit),
ucb_admit$Freq),
-grep("Freq", names(ucb_admit))]),
m2 = (ucb_admit %>%
slice(rep(1:n(), Freq)) %>%
select(-Freq)),
m3 = (ucb_admit %>%
mutate(Freq = map2(1, Freq, `:`)) %>%
unnest() %>%
select(-Freq)),
m4 = (ucb_admit %>%
rowwise() %>%
mutate(Freq = list(seq(1, Freq))) %>%
ungroup() %>%
unnest() %>%
select(-Freq)),
m5 = (ucb_admit %>%
rowwise() %>%
mutate(Freq = paste(seq(1, Freq), collapse = ",")) %>%
ungroup() %>%
separate_rows(Freq) %>%
select(-Freq)))
Unit: milliseconds
expr min lq mean median uq max neval
m1 3.455026 3.585888 4.295322 3.845367 4.147506 8.60228 100
m2 6.888881 7.541269 8.849527 8.031040 9.428189 15.53991 100
m3 23.252458 24.959122 29.706875 27.414396 32.506805 61.00691 100
m4 20.033499 21.914645 25.888155 23.611688 27.310155 101.15088 100
m5 28.972557 31.127297 35.976468 32.652422 37.669135 64.43884 100