2
votes

I'm confused by the behavior of data.table::melt().

I would like to reshape a data.table from wide to long, similar to this question

Here's my data.table:

dt <- data.table(id=c(1,2,3), varA1=c(2,6,1), varA2=c(1,1,1),varA3=c(1,2,3),  
                              varB1=c(1,0,1), varB2=c(1,1,1),varB3=c(0,0,0))

Here's what I want:

  id index varA
1:  1     1    2
2:  2     1    6
3:  3     1    1 
4:  1     2    1
5:  2     2    1
6:  3     2    1
7:  1     3    1
8:  2     3    2
9:  3     3    3

If I include both varA and varB as my measure vars, it works fine, using:

dt_long <- melt(dt, id.vars = "id", measure=patterns("^varA","^varB"), value.name = c("varA","varB"),variable.name = "index")

The output is as expected:

   id index varA varB
1:  1     1    2    1
2:  2     1    6    0
3:  3     1    1    1
4:  1     2    1    1
5:  2     2    1    1
6:  3     2    1    1
7:  1     3    1    0
8:  2     3    2    0
9:  3     3    3    0

However, I only want varA. With only a single measure variable, it no longer works. Using

dt_long <- melt(dt, id.vars ="id", measure=patterns("^varA"), value.name = "varA",variable.name = "index")

I get:

   id index varA
1:  1 varA1    2
2:  2 varA1    6
3:  3 varA1    1
4:  1 varA2    1
5:  2 varA2    1
6:  3 varA2    1
7:  1 varA3    1
8:  2 varA3    2
9:  3 varA3    3

Why is the index variable here not listing 1,2,3, as before?

1
See Ronak's comment below, if you agree, edit your question, then we can re-open it.zx8754

1 Answers

2
votes

Using multiple columns with melt measure.vars is tricky.

From melt help regarding measure.vars:

multiple patterns will produce multiple columns

If we apply data.table:::patterns source code to dt with "^varA","^varB", we get :

cols <- colnames(dt)
p = unlist(list("^varA","^varB"), use.names = any(nzchar(list("^varA","^varB"))))
lapply(p, grep, cols)

[[1]]
[1] 2 3 4

[[2]]
[1] 5 6 7

Leading to :

  • VarA1 associated with VarB1, factor index = 1
  • VarA2 associated with VarB2, factor index = 2
  • VarA3 associated with VarB3, factor index = 3

Note that index number has nothing to do with VarXi index, it's simply an auto-numbering of VarAi-VarBi factor.

For example, if we remove VarA1 we get following result:

   id index varA varB
1:  1     1    1    1
2:  2     1    1    0
3:  3     1    1    1
4:  1     2    1    1
5:  2     2    2    1
6:  3     2    3    1
7:  1     3   NA    0
8:  2     3   NA    0
9:  3     3   NA    0
  • VarA2 associated with VarB1, factor index = 1
  • VarA3 associated with VarB2, factor index = 2
  • VarB3 alone, factor index = 3

In both cases, as this factor is a composite factor, data.table returns its numeric index.

When you only use one pattern, you directly get a factor with levels corresponding to this single pattern:

dt_long <- melt(dt, id.vars = "id", measure=patterns("^varA"), value.name = c("varA"),variable.name = "index")[]
dt_long$index
[1] varA1 varA1 varA1 varA2 varA2 varA2 varA3 varA3 varA3
Levels: varA1 varA2 varA3

You could convert it to numeric to get the expected result:

dt_long[,index :=as.numeric(index)][]
   id index varA
1:  1     1    2
2:  2     1    6
3:  3     1    1
4:  1     2    1
5:  2     2    1
6:  3     2    1
7:  1     3    1
8:  2     3    2
9:  3     3    3