StackOverflow question
Hello fellows,
I am trying to "cross" multiple dataframes with R.
My data frames are coming from a high-throughput sequencing experiments and look like the followings :
df1 :
chr pos orient weight in_nucleosome in_subtelo
1 NC_001133 999 + 1 TRUE TRUE
2 NC_001133 1505 - 14 FALSE TRUE
3 NC_001133 1525 - 2 TRUE TRUE
4 NC_001134 480 + 1 TRUE TRUE
5 NC_001134 509 + 2 FALSE TRUE
6 NC_001134 539 + 3 FALSE TRUE
7 NC_001135 1218 + 1 TRUE TRUE
8 NC_001135 1228 + 2 TRUE TRUE
9 NC_001135 1273 + 1 TRUE TRUE
10 NC_001136 362 + 1 TRUE TRUE
and
df2:
chr feature start end orient
1 NC_001133 ARS 707 776 .
2 NC_001133 ARS 7997 8547 .
3 NC_001133 ARS 30946 31183 .
4 NC_001133 ARS_consensus_sequence 31002 31018 +
5 NC_001133 ARS_consensus_sequence 70418 70434 -
6 NC_001133 ARS_consensus_sequence 124463 124479 -
7 NC_001136 blocked_reading_frame 721071 721481 -
8 NC_001137 blocked_reading_frame 375215 377614 -
9 NC_001141 blocked_reading_frame 29032 30048 +
10 NC_001133 CDS 335 649 +
What I want to do is to know for a given chromosome ("chr" here) and for each df2$feature whether or not (df2$start < df1$pos < df2$end). I would then like to add a column to df1 whose name would be the one of the considered df2feature and filled with TRUE or FALSE in respect to the condition stated earlier.
I am pretty sure that the apply family of function have to be used maybe nested in one antoher but after hours of trying I can't manage to do it.
I did it in a very inelegant, long and error prone way with nested for loops but I am convinced there is a better simpler and maybe faster solution.
Thank you for reading this,
Antoine.
foverlaps
fromdata.table
orfindOverlaps
fromlibrary(GenomicRanges)
– akrun