I have a data.table (A) that is over 100,000 rows long. There are 3 columns.
chrom start end
1: chr1 6484847 6484896
2: chr1 6484896 6484945
3: chr1 6484945 6484994
4: chr1 6484994 6485043
5: chr1 6485043 6485092
---
183569: chrX 106893605 106893654
183570: chrX 106893654 106893703
183571: chrX 106893703 106893752
183572: chrX 106893752 106893801
183573: chrX 106893801 106894256
I'd like to generate a new column named "gene" that provides a label for each row based annotations from another data.table which has ~90 rows (B). Seen below:
chrom start end gene
1: chr1 6484847 6521004 ESPN
2: chr1 41249683 41306124 KCNQ4
3: chr1 55464616 55474465 BSND
42: chrX 82763268 82764775 POU3F4
43: chrX 100600643 100603957 TIMM8A
44: chrX 106871653 106894256 PRPS1
If the row start value in data.table A is within the row start and end values of data.table B I need the row in A to be labeled with the correct gene accordingly.
For example the resulting complete data.table A would be
chrom start end gene
1: chr1 6484847 6484896 ESPN
2: chr1 6484896 6484945 ESPN
3: chr1 6484945 6484994 ESPN
4: chr1 6484994 6485043 ESPN
5: chr1 6485043 6485092 ESPN
---
183569: chrX 106893605 106893654 TIMM8A
183570: chrX 106893654 106893703 TIMM8A
183571: chrX 106893703 106893752 TIMM8A
183572: chrX 106893752 106893801 TIMM8A
183573: chrX 106893801 106894256 TIMM8A
I've attempted some nested loops to do this but that seems like it would take WAY too long. I think there must be a way to do this with the data.table package but I can't seem to figure it out.
Any and all suggestions would be greatly appreciated.