1
votes

I have the following data.frame:

t1 = data.frame(
  t_id = c("61","61","61","62","62","63"), 
  u_id = c("84","85","86","84","87","88"), 
  type = c("d","d","d","s","s","d"),
  v1 = c(0.25, 0.25, 0.25, 0.35, 0.35, 0.45),v2 = c(0.30, 0.30, 0.40, 0.50, 0.50, 1.00),
  tdate = as.Date(c("2015-11-01","2015-11-02","2015-11-03","2015-10-01","2015-10-02","2015-09-01"))
 );

This is what it looks like:

  t_id u_id type   v1  v2      tdate
1   61   84    d 0.25 0.3 2015-11-01
2   61   85    d 0.25 0.3 2015-11-02
3   61   86    d 0.25 0.4 2015-11-03
4   62   84    s 0.35 0.5 2015-10-01
5   62   87    s 0.35 0.5 2015-10-02
6   63   88    d 0.45 1.0 2015-09-01

t_id is transaction_id and u_id is user_id. I want the output to be grouped by t_id with u_id and associating value in the column with the prefix u1_ for first user, u2_ for second user, and so on. Assume that there will be no more than 3 users per transactions.

Here's what the output should look like:

t_id u1_id u1_type u1_v1 u1_v2 u1_tdate    u2_id u2_type u2_v1 u2_v2 u2_tdate    u3_id u3_type u3_v1 u3_v2 u3_tdate
 61    84     d    0.25  0.3   2015-11-01    85     d    0.25  0.3   2015-11-02    86     d    0.25  0.4   2015-11-03    
 62    84     s    0.35  0.5   2015-10-01    87     s    0.35  0.5   2015-10-02   
 63    88     d    0.45  1.0   2015-09-01    

I tried playing around with reshape but to no avail. Any ideas as to how I should go about doing that?

2
You should read tidy data by Hadley Wickham. Your desired output is not coherent with those (good) principles; maybe that's why it is not so easy to achieve.SabDeM

2 Answers

2
votes

You cannot create such a data.frame (i.e., a different number of columns for each row). But you can create a list. This gets quite close:

lapply(split(t1,t1$t_id),function(x) {
  prefixes_counter=0
  if (nrow(x)>1) {
    Reduce(function(x1,x2) {
      prefixes_counter<<-prefixes_counter+1
      cn1=colnames(x1)
      cn2=colnames(x2)
      if (prefixes_counter==1) cn1[-1]=paste0("u",prefixes_counter,"_",cn1[-1])
      cn2[-1]=paste0("u",prefixes_counter+1,"_",cn2[-1])

      merge(`colnames<-`(x1,cn1),`colnames<-`(x2,cn2),by="t_id")
    },split(x,1:nrow(x)))
  } else {
    colnames(x)[-1]=paste0("u1_",colnames(x)[-1])
    x
  }
})

Oh yes, an BTW: I strongly second @SabDeM 's comment. ;)

0
votes

I think the best you can do is a series of casts:

library(reshape2)

t1 = data.frame(
  t_id = c("61","61","61","62","62","63"), 
  u_id = c("84","85","86","84","87","88"), 
  type = c("d","d","d","s","s","d"),
  v1 = c(0.25, 0.25, 0.25, 0.35, 0.35, 0.45),v2 = c(0.30, 0.30, 0.40, 0.50, 0.50, 1.00),
  tdate = as.Date(c("2015-11-01","2015-11-02","2015-11-03","2015-10-01","2015-10-02","2015-09-01")),
  stringsAsFactors=FALSE
)

vars <- names(t1)[-1]
t1$seq <- ave(t1$t_id,t1$t_id,FUN=function(x) paste0("u",seq(along=x),"_"))
out <- data.frame(t_id=unique(t1$t_id))

for(i in vars) {
  temp <- dcast(t1,t_id~seq,value.var=i)
  names(temp)[-1] <- paste0(names(temp)[-1],i)
  if(i=="tdate") temp[,-1] <- lapply(temp[,-1],as.Date)
  out <- merge(out,temp)
}

out <- out[,sort(names(out))]