0
votes

I have objects that have varying numbers of events at varying times. This is currently stored in a long format (using tibbles from library(tidyverse)) :

timing_tbl <- tibble(ID = c(101,101,101,102,102,103,103,103,103),
                     event_time = c(0,4,8,0,6,0,4,9,12))

The real data has thousands of objects, with up to 50 or so events, so I want to make this process as efficient as possible.

I would like to convert this to a pseudo-wide format, where the first column is the patient ID, and the second column is a list of the event times for that object. I can do that where the second column is a column of tibbles in the following way

tmp <- lapply(unique(timing_tbl$ID),
               function(x) timing_tbl[timing_tbl$ID == x, "event_time"])

timing_tbl2 <- tibble(unique(timing_tbl$ID),tmp)

> timing_tbl2[1,2]
# A tibble: 1 x 1
  tmp             
  <list>          
1 <tibble [3 × 1]>
> timing_tbl2[[1,2]]
# A tibble: 3 x 1
  event_time
       <dbl>
1       0   
2       4.00
3       8.00

I would prefer to store these objects as lists, as I then want to find the “distance” between each pair of objects using the following function, and I worry that extracting the vector from the list adds unnecessary processing, slowing down the calculation.

lap_exp2 <- function(x,y,tau) {
  exp(-abs(x - y)/tau)
}

distance_lap2 <- function(vec1,vec2,tau) {
  ## vec1 is first list of event times
  ## vec2 is second list of event times
  ## tau is the decay parameter
  0.5*(sum(outer(vec1,vec1,FUN=lap_exp2, tau = tau)) +
       sum(outer(vec2,vec2,FUN=lap_exp2, tau = tau))
       ) -
       sum(outer(vec1,vec2,FUN=lap_exp2, tau = tau))

}

distance_lap2(timing_tbl2[[1,2]]$event_time,timing_tbl2[[2,2]]$event_time,2)
[1] 0.8995764

If I try extracting the list instead of the tibble using [[

tmp <- lapply(unique(timing_tbl$ID),
               function(x) timing_tbl[[timing_tbl$ID == x, "event_time"]])

I get the following error, which makes sense

Error in col[[i, exact = exact]] : attempt to select more than one element in vectorIndex

Is there a reasonably simple way I can extract the column from the long tibble as a list and store it in the new tibble? Is this even the right way to go about this?

1

1 Answers

0
votes

I've found using tidyr::nest a good way to generate the 'list columns' I think you may be after (especially for stuffing in time series-ish sort of data). Hope the following helps!

library(dplyr)
library(tidyr)
library(purrr)

timing_tbl <- tibble(ID = c(101,101,101,102,102,103,103,103,103),
                     event_time = c(0,4,8,0,6,0,4,9,12))

ID_times <-
    timing_tbl %>%
    group_by(ID) %>%
    nest(.key = "times_df") %>%
    split(.$ID) %>%
    map(~ .$times_df %>% unlist(use.names = F))

# > ID_times
# $`101`
# [1] 0 4 8

# $`102`
# [1] 0 6

# $`103`
# [1]  0  4  9 12

dists_long <-
    names(ID_times) %>% 
    expand.grid(IDx = ., IDy = .) %>%
    filter(IDx != IDy) %>%
    rowwise() %>% 
    mutate(dist = distance_lap2(vec1 = ID_times[[IDx]], vec2 = ID_times[[IDy]], tau = 2))

# # A tibble: 6 x 3
#   IDx   IDy    dist
#   <fct> <fct> <dbl>
# 1 102   101   0.900
# 2 103   101   0.981
# 3 101   102   0.900
# 4 103   102   1.68 
# 5 101   103   0.981
# 6 102   103   1.68