3
votes

I have the below data-set:

data <- data.frame(id = c(6,7,96,216,216,384),
               date = c("2003-03-27", "2003-03-31", "2001-08-01", "2009-05-26", "2009-05-26", "2001-04-05"),
               description = rep("code"),
               variable = rep("schedule"),
               value = c(388,45,95,390,12,10))

And I am attempting to translate some old code that used plyr and reshape2 and instead use dplyr and tidyr (the old code takes forever to run).

Here is the old code:

data$num <- as.numeric(0)
data1 <- ddply(data, .(id, date), transform, num = cumsum(num+1))
data2 <- within(data1, id2 <- interaction(variable, num))
data3 <- dcast(data2, id + date + description ~ id2)
data3

And the output looks like:

   id       date description schedule.1 schedule.2
1   6 2003-03-27        code        388         NA
2   7 2003-03-31        code         45         NA
3  96 2001-08-01        code         95         NA
4 216 2009-05-26        code        390         12
5 384 2001-04-05        code         10         NA

I did not make it far in my attempt, I used group_by and then attempted to mutate a new variable with num = cumsum(num+1) however it just numbered each row 1, 2, 3, 4, etc.

My second guess was to use unite for the id2 variable, but that also did not work.

Any help would be greatly appreciated!

1
Your output doesn't make much sense given your input data frame. Are you sure that's exactly what you get, not simply an example of the general structure? - joran
@joran it should be correct now... i made an edit about 2 minutes after the original post to fix an error. - b222

1 Answers

4
votes

Using dplyr/tidyr, we group by 'id', 'date', create the 'num' column as row_number() with mutate, unite the two columns ('variable' and 'num) to a single column and use spread to reshape from 'long' to 'wide' format.

library(dplyr)
library(tidyr)
data %>% 
   group_by(id, date) %>% 
   mutate(num=row_number()) %>% 
   unite(variable1, variable, num, sep=".") %>%
   spread(variable1, value)
#    id       date description schedule.1 schedule.2
#1   6 2003-03-27        code        388         NA
#2   7 2003-03-31        code         45         NA
#3  96 2001-08-01        code         95         NA
#4 216 2009-05-26        code        390         12
#5 384 2001-04-05        code         10         NA

Or we can use dcast from the devel version of data.table i.e. v1.9.5. A convenient function to get the sequence column by grouping columns is getanID from splitstackshape. The output will be a 'data.table'. Then use dcast to convert from 'long' to 'wide' format.

library(data.table)#v1.9.5+
library(splitstackshape)
dcast(getanID(data, c('id', 'date')), id+date+description~ 
            paste('schedule', .id, sep="."), value.var='value')

NOTE: Instructions to install the devel version are here