8
votes

I have the following that I want to perform a multiple column melt upon:

# Load library
library(data.table)

# Dummy data set
dt <- data.table(a = 1:10,
                 t_alpha = runif(10),
                 t_beta = runif(10),
                 t_gamma = runif(10),
                 s_alpha = runif(10),
                 s_beta = runif(10),
                 s_gamma = runif(10))

# Look at data table
dt
#>      a    t_alpha    t_beta    t_gamma    s_alpha     s_beta   s_gamma
#>  1:  1 0.73315322 0.7777962 0.13258401 0.12841353 0.20865995 0.8967482
#>  2:  2 0.10161042 0.9691244 0.23506277 0.35130324 0.36672933 0.9247026
#>  3:  3 0.95234142 0.7002526 0.07104051 0.85558314 0.39467530 0.8982107
#>  4:  4 0.48967202 0.3951288 0.72376459 0.58978334 0.47201394 0.8508710
#>  5:  5 0.22997537 0.3487354 0.58831312 0.63917860 0.12685815 0.5616682
#>  6:  6 0.01138277 0.0449675 0.79384717 0.28983485 0.98793013 0.6574457
#>  7:  7 0.32324941 0.3446745 0.66394161 0.16357679 0.14828451 0.2019661
#>  8:  8 0.78644407 0.4460397 0.76992857 0.78703793 0.46623055 0.3081812
#>  9:  9 0.90938120 0.4526063 0.47948163 0.03182248 0.63600496 0.3598600
#> 10: 10 0.11345349 0.1732653 0.59240218 0.95541944 0.06640378 0.9475107

So, I go ahead and melt, like so:

# Groups of columns to melt
m <- list(names(dt)[grepl("t_", names(dt))],
          names(dt)[grepl("s_", names(dt))])

# Perform melt
dt2 <- melt(dt, measure = m, value.name = c("t", "s"))

which gives,

# Examine results
dt2
#>      a variable          t          s
#>  1:  1        1 0.73315322 0.12841353
#>  2:  2        1 0.10161042 0.35130324
#>  3:  3        1 0.95234142 0.85558314
#>  4:  4        1 0.48967202 0.58978334
#>  5:  5        1 0.22997537 0.63917860
#>  6:  6        1 0.01138277 0.28983485
#>  7:  7        1 0.32324941 0.16357679
#>  8:  8        1 0.78644407 0.78703793
#>  9:  9        1 0.90938120 0.03182248
#> 10: 10        1 0.11345349 0.95541944
#> 11:  1        2 0.77779618 0.20865995
#> 12:  2        2 0.96912444 0.36672933
#> 13:  3        2 0.70025264 0.39467530
#> 14:  4        2 0.39512877 0.47201394
#> 15:  5        2 0.34873538 0.12685815
#> 16:  6        2 0.04496750 0.98793013
#> 17:  7        2 0.34467455 0.14828451
#> 18:  8        2 0.44603969 0.46623055
#> 19:  9        2 0.45260634 0.63600496
#> 20: 10        2 0.17326532 0.06640378
#> 21:  1        3 0.13258401 0.89674825
#> 22:  2        3 0.23506277 0.92470265
#> 23:  3        3 0.07104051 0.89821074
#> 24:  4        3 0.72376459 0.85087097
#> 25:  5        3 0.58831312 0.56166816
#> 26:  6        3 0.79384717 0.65744574
#> 27:  7        3 0.66394161 0.20196612
#> 28:  8        3 0.76992857 0.30818121
#> 29:  9        3 0.47948163 0.35986003
#> 30: 10        3 0.59240218 0.94751075
#>      a variable          t          s

That's great!

Now, notice an extra column, variable. That corresponds to each of my columns for t and s. Specifically, 1 = alpha, 2 = beta, and 3 = gamma. I can fix this variable as follows:

# Fix variable names
v_names <- c("alpha", "beta", "gamma")
dt2$variable <- v_names[dt2$variable]

producing,

# Examine results
dt2
#>      a variable          t          s
#>  1:  1    alpha 0.73315322 0.12841353
#>  2:  2    alpha 0.10161042 0.35130324
#>  3:  3    alpha 0.95234142 0.85558314
#>  4:  4    alpha 0.48967202 0.58978334
#>  5:  5    alpha 0.22997537 0.63917860
#>  6:  6    alpha 0.01138277 0.28983485
#>  7:  7    alpha 0.32324941 0.16357679
#>  8:  8    alpha 0.78644407 0.78703793
#>  9:  9    alpha 0.90938120 0.03182248
#> 10: 10    alpha 0.11345349 0.95541944
#> 11:  1     beta 0.77779618 0.20865995
#> 12:  2     beta 0.96912444 0.36672933
#> 13:  3     beta 0.70025264 0.39467530
#> 14:  4     beta 0.39512877 0.47201394
#> 15:  5     beta 0.34873538 0.12685815
#> 16:  6     beta 0.04496750 0.98793013
#> 17:  7     beta 0.34467455 0.14828451
#> 18:  8     beta 0.44603969 0.46623055
#> 19:  9     beta 0.45260634 0.63600496
#> 20: 10     beta 0.17326532 0.06640378
#> 21:  1    gamma 0.13258401 0.89674825
#> 22:  2    gamma 0.23506277 0.92470265
#> 23:  3    gamma 0.07104051 0.89821074
#> 24:  4    gamma 0.72376459 0.85087097
#> 25:  5    gamma 0.58831312 0.56166816
#> 26:  6    gamma 0.79384717 0.65744574
#> 27:  7    gamma 0.66394161 0.20196612
#> 28:  8    gamma 0.76992857 0.30818121
#> 29:  9    gamma 0.47948163 0.35986003
#> 30: 10    gamma 0.59240218 0.94751075
#>      a variable          t          s

Created on 2019-08-09 by the reprex package (v0.3.0)

This is my desired result. My question is, can I perform this operation (i.e., renaming of the values in the variable column) during the melt or do I just have to patch it up afterwards?

1
This is not your question, but FYI you don't need to make m, you can use the patterns function with this code melt(dt, measure.vars = patterns(t = 't_', s = 's_'))IceCreamToucan
See also stackoverflow.com/questions/41883573/… (no real answer for how to do it within melt, I guess it's not possible)IceCreamToucan
There is an open issue requesting this functionality on data.table github: github.com/Rdatatable/data.table/issues/2551IceCreamToucan
Not what you were looking for I guess but reshape can do this: reshape(dt, idvar = "a", varying = 2:ncol(dt), sep = "_", direction = "long", timevar = "variable")markus
One option is patching up on the same line: melt(dt, measure.vars = patterns(t = 't_', s = 's_'))[, variable := factor(variable, labels = c("alpha", "beta", "gamma"))]sindri_baldur

1 Answers

5
votes

Here is an option in pivot_longer from the dev version of tidyr

library(tidyr)# ‘0.8.3.9000’
library(dplyr)
dt %>% 
   pivot_longer(-a, names_to= c(".value", "variable"), names_sep = "_")
# A tibble: 30 x 4
#       a variable      t       s
#   <int> <chr>     <dbl>   <dbl>
# 1     1 alpha    0.592  0.0634 
# 2     1 beta     0.316  0.339  
# 3     1 gamma    0.760  0.751  
# 4     2 alpha    0.936  0.445  
# 5     2 beta     0.232  0.00250
# 6     2 gamma    0.0111 0.406  
# 7     3 alpha    0.0950 0.102  
# 8     3 beta     0.426  0.775  
# 9     3 gamma    0.321  0.811  
#10     4 alpha    0.884  0.0526 
# … with 20 more rows

NOTE: The values are different as there was no set.seed