Given that your column names all share a prefix, tidyr::pivot_longer
makes this easy. First I'll create some sample data based on your description:
## semi-realistic sample data
nr = 3
n_group = 4
n_per_group = 3
nc = n_group * n_per_group
df = as.data.frame(matrix(rnorm(nr * nc), nrow = nr))
names(df) = outer(letters[1:n_group], 1:n_per_group, paste, sep = ".")
df = cbind(id = 1:nr, df)
# id a.1 b.1 c.1 d.1 a.2 b.2 c.2 d.2
# 1 1 0.7898394 -2.34853091 -0.15575050 -1.18205111 -0.6348410 2.545300 0.8265652 0.8817947
# 2 2 0.5717691 -1.46586503 0.01206218 0.83525523 1.6629318 1.602706 -1.5781837 1.1245715
# 3 3 -1.5727813 0.04925075 -0.82081511 -0.06905755 -0.4137177 -1.734805 -0.3742163 -1.0701552
# a.3 b.3 c.3 d.3
# 1 0.3238154 2.2401514 -0.70958661 -0.09062723
# 2 0.7016303 -0.9547929 0.23150225 0.39295502
# 3 0.7229709 -1.7137527 0.09044422 -0.59431548
With tidyr
:
## convert to long format
library(tidyr)
pivot_longer(df, cols = -id, names_sep = "\\.", names_to = c(".value", "iteration"))
# # A tibble: 9 x 6
# id iteration a b c d
# <int> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 0.790 -2.35 -0.156 -1.18
# 2 1 2 -0.635 2.55 0.827 0.882
# 3 1 3 0.324 2.24 -0.710 -0.0906
# 4 2 1 0.572 -1.47 0.0121 0.835
# 5 2 2 1.66 1.60 -1.58 1.12
# 6 2 3 0.702 -0.955 0.232 0.393
# 7 3 1 -1.57 0.0493 -0.821 -0.0691
# 8 3 2 -0.414 -1.73 -0.374 -1.07
# 9 3 3 0.723 -1.71 0.0904 -0.594
Using the same sample data, here's a data.table::melt
approach based on column numbers:
## note that `nc` is the number of non-ID columns.
## The `1:nc + 1` is a way to get the column numbers of the stack columns
## (+ 1 because there is 1 id column in my example)
dt = as.data.table(df)
melt(dt, id.vars = "id", measure.vars = split(1:nc + 1, rep(1:n_group, times = n_per_group)))
# id variable 1 2 3 4
# 1: 1 1 0.7898394 -2.34853091 -0.15575050 -1.18205111
# 2: 2 1 0.5717691 -1.46586503 0.01206218 0.83525523
# 3: 3 1 -1.5727813 0.04925075 -0.82081511 -0.06905755
# 4: 1 2 -0.6348410 2.54530008 0.82656523 0.88179474
# 5: 2 2 1.6629318 1.60270621 -1.57818365 1.12457150
# 6: 3 2 -0.4137177 -1.73480524 -0.37421630 -1.07015525
# 7: 1 3 0.3238154 2.24015140 -0.70958661 -0.09062723
# 8: 2 3 0.7016303 -0.95479285 0.23150225 0.39295502
# 9: 3 3 0.7229709 -1.71375273 0.09044422 -0.59431548
(The results look different at a glance, but it's just a different order of the same rows.)
x
get stacked together? Or is it purely positional? Using themtcars
example, cols1:5
as you say are identifiers, then for6:ncol(mtcars)
it's is sets of 3? – Gregor Thomasdata.table::melt
andtidyr::pivot_longer
are very powerful as to which columns get stacked and could handle a naming pattern.data.table::melt
will work well with column numbers too. – Gregor Thomasdata.table::melt
.reshape2::melt
doesn't have that feature, butdata.table::melt
lets you pass alist
tomeasure.vars
. For every 3 columns between 2 and 1000 you would usemeasure.vars = split(2:1000, (seq_along(2:1000) - 1) %/% 3)
. But with a names patterntidyr::pivot_longer
is also easy. – Gregor Thomas