I currently have a motif search working in a series of for loops and would like to move to a nested tibble to improve speed and simplicity (ish). However, I cannot figure out how to store a tibble within a tibble so I can then unnest it. If that's not possible, tips on how to pass the lists (and an id column) so I could later join it to the original table would be appreciated.
Input: set of coordinates and the corresponding DNA sequence
Goals:
1) Find instances of the motif I care about
2) Combine those with the start or end of the range to create all pairs of starts and ends (where the found position can be either)
3) Determine the type of the pairing
I cannot figure out how to get mutate to accept a tibble (Error in mutate_impl(.data, dots) : Column `pairs` is of unsupported class data.frame). I can't call rowwise here because I need to send the whole list of positions to the function, as well as values from other columns.
test_input = tibble(
start = c(1,10,15),
end = c(9, 14, 25),
sequence = c("GAGAGAGTC","CATTT", "TCACAGTTTCC")
)
custom_function = function(start, end, list.of.positions) {
## Doesn't include extra math, case specifications, and error handling here for simplicity
starts = c(start, list.of.positions)
ends = c(end, list.of.positions)
pairs = expand.grid(starts, ends) %>% as_tibble %>%
mutate(type = case_when(TRUE ~ "a_type")) #Simplified for example to one case
return(pairs)
}
test_input %>%
# for each set of coordinates/string
rowwise() %>%
# find the positions of a given motif
mutate(match.positions = regexp.match.ends(gregexpr("AG", sequence))) %>%
mutate(num.matches = case_when(
is_logical(match.positions) ~ NA_integer_,
TRUE ~ length(match.positions)
)) %>%
# expand and covert to real positions
unnest %>% rowwise %>%
mutate(true.positions = case_when(
is.na(match.positions) ~ NA_real_, #must be a double-compatible NA
TRUE ~ start + match.positions - 1)) %>%
select(-match.positions) %>%
ungroup() %>%
# re-"nest" into a list of real positions
group_by_at(vars(-true.positions)) %>%
summarise(true.positions = list(true.positions)) %>%
# pass list of real positions to a function that creates pairs of coordinates and determines the type of pair
mutate(pairs = custom_function(start, end, true.positions))
My final tibble should look like this (after unnesting pairs):
start end sequence new.start new.end type
<dbl> <dbl> <chr> <dbl> <dbl> <chr>
1 1 9 GAGAGAGTC 1 3 a_type
1 1 9 GAGAGAGTC 1 5 a_type
2 1 9 GAGAGAGTC 1 7 a_type
3 1 9 GAGAGAGTC 1 9 a_type
4 1 9 GAGAGAGTC 3 5 a_type
...
10 1 9 GAGAGAGTC 7 9 a_type
11 10 14 CATTT 10 14 a_type
...
One workaround I thought of was to paste the output values into a string and pass it back as a list, which the tibble tolerates, unnesting, and then separating it but surely there's a less hacky way to go about this. Many thanks for your help/ideas!