3
votes

I have a question about the cast and melt function of reshape2 (I think).

I have a dataset with one patient per row and all observation in columns twice (for observer x and y seperate).

Edit: - I changed the example data a little bit to resemble my real data better because variables have been tested on multiple levels in the patients -

It looks like this:

data <- data.frame(letters[1:2], 1:2, 3:4, 5:6, 7:8, 9:10, 11:12, 13:14, 15:16 )
colnames(data) <- c("pat_id", "var1_1_x", "var1_1_y", "var1_2_x", "var1_2_y", "var2_1_x", "var2_1_y", "var2_2_x", "var2_2_y")
data

For analysis, I would like to have one column per observer with all the values for all variables in it. So it would look like this (sorry for the crappy table):

pat.id    variable   level   obs_x  obs_y      
a          var1        1      1     3
b          var1        1      2     4
a          var1        2      5     7
b          var1        2      6     8
a          var2        1      9     11
b          var2        1      10    12
a          var2        2      13    15
b          var2        2      14    16

When I use

dataM <- melt(data, id="pat_id")
dataM

I end up with a data frame with dim 3 x 16 with all values in one column. I can't seem to seperate the values from the two observers.

I also tried to use the 'grep' function to 'rbind' all the columns from obs_x and obs_y seperately into a new data frame and to 'cbind' from the molten data set rows with values for x and y, but some variable names contain x's and y's in their names.

Edit: - I hope this doesn't overcomplicate my question. I tried using the reshape2 function to melt the data and than colsplit based on the underscore and it works perfect. When I dcast the data frame with sample data it works as well, but when I use it on my real data (original dimensions 600x250) it doens't: the dimensions and colnames work fine, but the values for the observers x and y are changed to '1' and I don't understand why. -

Edit 2: - I renamed the real data to dummy

str(dummmy_melt)
'data.frame':   299008 obs. of  3 variables:
$ DesirNr : Factor w/ 584 levels "1001","1002",..: 10 20 31 41 43 44 45 46 47 57 ...
$ variable: Factor w/ 512 levels "X.1_F","ReaderNr.1_F",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value   : Factor w/ 1024 levels "01 01 MRI.pdf",..: 1 2 3 4 5 6 7 8 9 10 ...

str(dummy_split)
'data.frame':   299008 obs. of  6 variables:
$ DesirNr : Factor w/ 584 levels "1001","1002",..: 10 20 31 41 43 44 45 46 47 57 ...
$ variable: Factor w/ 512 levels "X.1_F","ReaderNr.1_F",..: 1 1 1 1 1 1 1 1 1 1 ...
$ value   : Factor w/ 1024 levels "01 01 MRI.pdf",..: 1 2 3 4 5 6 7 8 9 10 ...
$ observer: chr  "F" "F" "F" "F" ...
$ afwijk  : chr  "X" "X" "X" "X" ...
$ level   : int  1 1 1 1 1 1 1 1 1 1 ...
  • End of edit 2.

Maybe I am totally overlooking the obvious solution, if someone has some help I would really appreciate it!

1
Thanks for the answer. I thought my problem was "special" (we all do, don't we?) because I wanted to split based on the colnames/variable names. - debruinf

1 Answers

4
votes

This is more of a job for base R's reshape function:

reshape(data, direction = "long", idvar="pat_id", 
        varying = 2:ncol(data), v.names=c("x", "y"), 
        times=c("var_1", "var_2"))
#         pat_id  time x y
# a.var_1      a var_1 1 4
# b.var_1      b var_1 2 3
# c.var_1      c var_1 3 2
# d.var_1      d var_1 4 1
# a.var_2      a var_2 5 8
# b.var_2      b var_2 6 7
# c.var_2      c var_2 7 6
# d.var_2      d var_2 8 5

For the record, here's what you would have to do with "reshape2":

library(reshape2)
dataM <- melt(data, id="pat_id")
dataM <- cbind(dataM, colsplit(dataM$variable, "_", c("V", "var", "obs")))
dcast(dataM, pat_id + var ~ obs, value.var="value")
#   pat_id var x y
# 1      a   1 1 4
# 2      a   2 5 8
# 3      b   1 2 3
# 4      b   2 6 7
# 5      c   1 3 2
# 6      c   2 7 6
# 7      d   1 4 1
# 8      d   2 8 5

There's also Reshape from my "splitstackshape" package that is mostly a wrapper around reshape to allow for unbalanced datasets.

library(splitstackshape)
Reshape(data, id.vars="pat_id", var.stubs=c("x", "y"), sep="_")