0
votes
> log_df[1:10, ]
      tagid            happened status
1  03B2ACE7 2016-06-28 18:07:36   open
2  03B2ACE7 2016-06-28 18:36:15 closed
3  03B2ACE7 2016-06-29 07:29:59   open
4  03B2ACE7 2016-06-29 08:06:23 closed
5  03B2ACE7 2016-06-30 16:10:48   open
6  03B2ACE7 2016-06-30 17:23:55   open
7  03B2ACE7 2016-07-01 10:12:06 closed
8  03B2ACE7 2016-07-01 13:39:58 closed
9  03B2ACE7 2016-07-02 10:08:40   open
10 03B2ACE7 2016-07-02 13:33:01 closed
...

Above is my data. What I'd like to produce is:

      tagid                open               closed
1  03B2ACE7 2016-06-28 18:07:36  2016-06-28 18:36:15
2  03B2ACE7 2016-06-29 07:29:59  2016-06-29 08:06:23
3  03B2ACE7 2016-06-30 16:10:48  2016-07-01 10:12:06
...

I was trying to make it work with dcast in reshape2 package. However, I have to be selective where I only pick up

"open" that is very first one and only those that comes right after closed and "close" that comes right before open.

So from log_df, row 6 and 7 will be ignored..

I am really stuck and not sure how I can go about this.. Maybe dcast is not the best approach?

Please help! Thank you so much!

1

1 Answers

1
votes

Using dplyr and tidyr (from tidiverse, evolution of reshape):

library(dplyr)
library(tidyr)

df %>% 
    filter((status == 'open' & lag(status, default = "") != 'open') | (status == 'closed' & lead(status, default = "") != "closed")) %>% 
    mutate(r = ceiling(row_number() / 2)) %>% 
    spread(status, happened)

#>      tagid r              closed                open
#> 1 03B2ACE7 1 2016-06-28 18:36:15 2016-06-28 18:07:36
#> 2 03B2ACE7 2 2016-06-29 08:06:23 2016-06-29 07:29:59
#> 3 03B2ACE7 3 2016-07-01 13:39:58 2016-06-30 16:10:48
#> 4 03B2ACE7 4 2016-07-02 13:33:01 2016-07-02 10:08:40

It:

  1. Filter the data.frame with the specific condition
  2. Add a column to store the 'group'
  3. Spread the values to the columns (equivalent to dcast)

Data:

df <- read.table(text = '      tagid            happened status
1  03B2ACE7 "2016-06-28 18:07:36"   open
2  03B2ACE7 "2016-06-28 18:36:15" closed
3  03B2ACE7 "2016-06-29 07:29:59"   open
4  03B2ACE7 "2016-06-29 08:06:23" closed
5  03B2ACE7 "2016-06-30 16:10:48"   open
6  03B2ACE7 "2016-06-30 17:23:55"   open
7  03B2ACE7 "2016-07-01 10:12:06" closed
8  03B2ACE7 "2016-07-01 13:39:58" closed
9  03B2ACE7 "2016-07-02 10:08:40"   open
10 03B2ACE7 "2016-07-02 13:33:01" closed', h = T)