I have objects that have varying numbers of events at varying times. This is currently stored in a long format (using tibbles from library(tidyverse)) :
timing_tbl <- tibble(ID = c(101,101,101,102,102,103,103,103,103),
event_time = c(0,4,8,0,6,0,4,9,12))
The real data has thousands of objects, with up to 50 or so events, so I want to make this process as efficient as possible.
I would like to convert this to a pseudo-wide format, where the first column is the patient ID, and the second column is a list of the event times for that object. I can do that where the second column is a column of tibbles in the following way
tmp <- lapply(unique(timing_tbl$ID),
function(x) timing_tbl[timing_tbl$ID == x, "event_time"])
timing_tbl2 <- tibble(unique(timing_tbl$ID),tmp)
> timing_tbl2[1,2]
# A tibble: 1 x 1
1 <tibble [3 × 1]>
> timing_tbl2[[1,2]]
# A tibble: 3 x 1
1 0
2 4.00
3 8.00
I would prefer to store these objects as lists, as I then want to find the “distance” between each pair of objects using the following function, and I worry that extracting the vector from the list adds unnecessary processing, slowing down the calculation.
lap_exp2 <- function(x,y,tau) {
exp(-abs(x - y)/tau)
distance_lap2 <- function(vec1,vec2,tau) {
## vec1 is first list of event times
## vec2 is second list of event times
## tau is the decay parameter
0.5*(sum(outer(vec1,vec1,FUN=lap_exp2, tau = tau)) +
sum(outer(vec2,vec2,FUN=lap_exp2, tau = tau))
) -
sum(outer(vec1,vec2,FUN=lap_exp2, tau = tau))
[1] 0.8995764
If I try extracting the list instead of the tibble using [[
tmp <- lapply(unique(timing_tbl$ID),
function(x) timing_tbl[[timing_tbl$ID == x, "event_time"]])
I get the following error, which makes sense
Error in col[[i, exact = exact]] : attempt to select more than one element in vectorIndex
Is there a reasonably simple way I can extract the column from the long tibble as a list and store it in the new tibble? Is this even the right way to go about this?