How can compare two data.frame (df1 and df2) and extract the rows with common gene names
df1 =
logp chr start end CNA Genes No.of.genes
25.714.697 1 90100868 90212160 gain Iqca,Ackr3 2
2.213.423 1 175422136 176019087 loss Rgs7,Fh1,Kmo,Opn3,Chml,Wdr64,Gm25560,Exo1,Gm23805,Pld5,B020018G12Rik 11
5.607.005 2 145619035 147312698 gain Slc24a3,Rin2,Naa20,Crnkl1,4930529M08Rik,Insm1,Ralgapa2,Xrn2,Nkx2-4,Nkx2-2,Gm22261 11
3.756.075 2 141246149 141653989 loss Macrod2 1
4.852.608 2 41586450 41739605 loss Lrp1b 1
590.684 2 86729423 86860061 loss Olfr1089,Olfr1090,Olfr1093,Olfr1093,Olfr141,Olfr1094,Olfr1094,Olfr1095 8
5.721.239 3 25408115 25519319 gain Nlgn1 1
4.295.527 3 92005564 92134972 gain Pglyrp3,Prr9 2
4.257.749 3 15244004 15897870 gain Gm9733,Gm9733,Gm9733,Gm9733,Sirpb1a,Sirpb1a,Sirpb1a,Sirpb1a,Sirpb1b,Sirpb1b,Sirpb1b,Sirpb1b,Sirpb1c,Sirpb1c,Sirpb1c,Sirpb1c 16
418.259 3 154861710 155490219 loss Tnni3k,Tnni3k,Fpgt,Gm26456,Lrriq3 5
2.284.327 4 134885344 137474898 gain Rhd,Rhd,Tmem50a,D4Wsu53e,Syf2,Runx3,Clic4,Srrm1,Ncmap,Rcan3,Nipal3,Stpg1,Gm25317,Grhl3,Gm23106,Ifnlr1,Il22ra1,Myom3,Srsf10,Pnrc2,Pnrc2,Cnr2,Fuca1,Hmgcl,Gale,Lypla2,Pithd1,Tceb3,Rpl11,Gm26001,Id3,E2f2,Asap3,Tcea3,Zfp46,Hnrnpr,Htr1d,Luzp1,Kdm1a,4930549C01Rik,Lactbl1,Ephb2,C1qb,C1qc,C1qa,Epha8,Zbtb40,Gm23834,Gm23834,Wnt4,Cdc42,Gm13011,Gm13011,Cela3b,Cela3b,Hspg2 56
1.017.899 4 108176679 108417038 gain Echdc2,Zyg11a,Zyg11b,Selrc1,Fam159a,Gpx7 6
2.229.929 4 80406963 83998058 gain Tyrp1,Lurap1l,Mpdz,n-R5s187,Nfib,Zdhhc21,Cer1,Frem1,Ttc39b,Gm23412,Snapc3,Psip1,Ccdc171,Gm25899,Gm25899 15
279.458 4 110534756 110628705 gain Agbl4 1
1.103.167 4 121565222 124833802 gain Ppt1,Cap1,Mfsd2a,Mycl,Trit1,Bmp8b,Bmp8b,Oxct2b,Ppie,Hpcal4,Nt5c1a,Heyl,Pabpc4,Gm25788,Gm22154,Bmp8a,Bmp8a,Oxct2a,Macf1,Ndufs5,Akirin1,Rhbdl2,Mycbp,Rragc,Gm22983,Pou3f1,Utp11l,Gm24480,Fhl3,Sf3a3,Inpp5b,Mtf1,n-R5s192 33
1.781.441 4 139917291 140083763 loss Klhdc7a,Igsf21 2
6.829.744 6 147086557 147179673 gain Mansc4,Klhl42 2
1.070.905 6 63350920 64077379 loss Grid2 1
3.132.886 7 17188025 18205037 gain Psg29,Ceacam5,Ceacam14,Gm5155,Ceacam11,Ceacam13,Ceacam12,Igfl3,Igfl3 9
591.926 7 26773232 26976928 gain Cyp2a5,Cyp2a5,Cyp2a5,Cyp2a22,Cyp2a22,Cyp2a22 6
4.170.656 7 20654493 24128503 gain Nlrp4e,Nlrp5,Gm10175,Zfp180,Zfp112 5
2.494.001 7 38898625 38991306 loss Gm21142,Gm25671 2
13.222.294 7 67330026 67943164 loss Mef2a,Lrrc28,Gm23233,Ttc23,Synm 5
1.330.269 7 7171339 10865583 loss Zfp418,Clcn4-2,Zik1,Nlrp4b 4
3.414.431 8 49942996 51497632 loss Gm23986 1
3.059.542 9 21959210 22072123 gain Epor,Rgl3,Ccdc151,Prkcsh,Elavl3,Zfp653 6
5.277.845 10 80335500 80575991 gain Reep6,Adamtsl5,Plk5,Mex3d,Mbd3,Uqcr11,Uqcr11,Tcf3,Gm25044,Gm25044,Gm25044,Gm25044,Onecut3,Atp8b3,Rexo1,Klf16 16
26.812.338 10 100597718 100692256 loss 1700017N19Rik 1
6.998.267 11 60393963 60504695 gain Lrrc48,Atpaf2,Gid4,Drg2,Myo15 5
2.624.723 11 75676344 76212635 gain Crk,Ywhae,Doc2b,Rph3al,1700016K19Rik,Fam101b,Vps53,Glod4,Fam57a,Gemin4 10
11.851.916 11 97742687 97853778 gain Pip4k2b,Cwc25,1700001P01Rik,Rpl23,SNORA21,Snora21,Lasp1 7
3.553.325 11 74899198 75121318 loss Tsr1,Srr,Smg6,Gm22733 4
309.751 11 105624215 107309569 loss Tanc2,Cyb561,Ace,Ace,Kcnh6,Dcaf7,Taco1,Map3k3,Limd2,Strada,Ccdc47,Ddx42,Ftsj3,Psmc5,Gm23645,Smarcd2,Tcam1,Gh,Gh,Gh,Gh,Gh,Cd79b,Scn4a,2310007L24Rik,Icam2,Ern1,Snord104,Gm22711,Tex2,Milr1,Gm25889,Polg2,Ddx5,Cep95,Smurf2,Bptf,Nol11,Pitpnc1 39
2.642.471 11 30118384 30155192 loss Sptbn1 1
10.304.184 12 114641806 116183315 gain Ighv1-73,Ighv1-83,Zfp386,Zfp386,Zfp386,Zfp386,Zfp386,Zfp386,Zfp386,Zfp386,Vipr2 11
1.414.343 12 116239354 117192837 loss Wdr60,Esyt2,Ncapg2,Gm25112,Gm24354,Ptprn2 6
2.875.469 14 10676764 10768859 loss Fhit 1
7.743.121 14 52237972 52331429 loss Rab2b,Gm23758,Tox4,Mettl3,Sall2 5
2.689.596 14 43932587 45325020 loss Ang5,Ang6,Ear2,Ear2,Ptgdr,Ptger2,Txndc16,Gpr137c,Ero1l 9
1.912.962 14 119385279 119496386 loss Hs6st3 1
950.029 14 118589508 118681878 loss Abcc4 1
4.105.345 14 3004822 8437757 loss Flnb,Dnase1l3,Abhd6,Rpp14,Rpp14,Pxk,Pdhb,Kctd6,Acox2,Fam107a,Oit1,4930452B06Rik 12
1.870.555 16 33446020 33668062 loss Zfp148,Slc12a8 2
3.148.258 17 5087550 8333690 gain Arid1b,Tmem242,Zdhhc14,Snx9,Synj2,Serac1,Gtf2h5,Tulp4,n-R5s26,Tmem181a,Dynlt1a,Dynlt1b,Tmem181b-ps,Tmem181b-ps,Dynlt1c,Tmem181c-ps,Dynlt1f,Sytl3,Ezr,Rsph3b,Tagap1,Rnaset2b,Rnaset2b,Gm25119,Rps6ka2,Ttll2,Gm9992,Gm26057,Fndc1,Tagap,Rsph3a,Gm22416,Rnaset2a,Rnaset2a,Fgfr1op,Ccr6,Mpc1,Sft2d1 38
50.819.398 17 40052632 40331607 gain Gm7148,Pgk2,Crisp3,Crisp1 4
4.099.936 17 14074943 15508274 loss Dact2,Smoc2,Thbs2,Gm23352,Wdr27,1600012H06Rik,Phf10,Gm3417,9030025P20Rik,Gm3448,Gm3435,Tcte3,Ermard,Dll1,Fam120b,Psmb1,Tbp 17
12.022.555 17 30590875 31053645 loss Glo1,Dnah8,Gm24661,Gm24661,Gm24661,Gm24661,Gm24661,Gm24661,Gm24661,Gm24661,Glp1r,Umodl1 12
5.135.466 17 36160573 36277761 loss Gm22453,Rpp21,Trim39 3
4.254.769 17 27372278 27593833 loss Grm4,Hmga1,Nudt3 3
5.565.997 18 87905985 87999255 loss Gm24987,Gm24987 2
df2 =
Recursive_level logp chr start end CNA Genes No.of.Gene
1 1.416.541 1 68580000 68640000 loss Erbb4 1
1 7.876.897 1 173840000 174010000 loss Mndal,Mnda,Ifi203,Ifi202b 4
1 6.280.751 1 173500000 173660000 loss BC094916,Pydc4,Pyhin1 3
1 7.369.317 1 115900000 116280000 loss Cntnap5a 1
2 128.766 2 146170000 146660000 gain 4930529M08Rik,Insm1,Ralgapa2 3
1 5.777.222 2 76720000 76800000 loss Ttn 1
2 1.448.913 3 15360000 16000000 loss Sirpb1a,Sirpb1a,Sirpb1a,Sirpb1a,Sirpb1b,Sirpb1b,Sirpb1b,Sirpb1b,Sirpb1c,Sirpb1c,Sirpb1c,Sirpb1c 12
1 3.845.977 4 119500000 125160000 gain AA415398,AA415398,AA415398,AA415398,AA415398,Foxj3,Guca2a,Guca2b,Hivep3,Edn2,Foxo6,Scmh1,Slfnl1,Ctps,Cited4,Kcnq4,Nfyc,Mir30c-1,Mir30e,Rims3,Exo5,Zfp69,Smap2,Col9a2,Zmpste24,Tmco2,Rlf,Ppt1,Cap1,Mfsd2a,Mycl,Trit1,Bmp8b,Bmp8b,Oxct2b,Ppie,Hpcal4,Nt5c1a,Heyl,Pabpc4,Bmp8a,Bmp8a,Oxct2a,Macf1,Ndufs5,Akirin1,Rhbdl2,Mycbp,Rragc,Pou3f1,Utp11l,Fhl3,Sf3a3,Inpp5b,Mtf1,n-R5s192,1110065P20Rik,Yrdc,Maneal,Cdca8,Rspo1,Gnl2,Dnali1,Snip1,Meaf6,Zc3h12a 66
1 1.446.699 4 73900000 74180000 gain Frmd3 1
1 2.262.305 4 72740000 72880000 gain Aldoart1 1
1 1.234.215 4 80820000 84340000 gain Tyrp1,Lurap1l,Mpdz,n-R5s187,Nfib,Zdhhc21,Cer1,Frem1,Ttc39b,Snapc3,Psip1,Ccdc171,Bnc2 13
1 123.671 4 108480000 108760000 gain Zcchc11,Prpf38a,Orc1,Cc2d1b,Zfyve9 5
1 1.418.261 4 139400000 147600000 loss Ubr4,Iffo2,Aldh4a1,Tas1r2,Pax7,Klhdc7a,Igsf21,Arhgef10l,Rcc2,Padi4,Padi3,Padi1,Padi2,Sdhb,Atp13a2,Mfap2,Crocc,Necap2,Spata21,Szrd1,Fbxo42,Rsg1,Arhgef19,Epha2,Fam131c,Clcnka,Clcnka,Clcnkb,Clcnkb,Hspb7,Zbtb17,Spen,Fblim1,Tmem82,Slc25a34,Plekhm2,Ddi2,Rsc1a1,Agmat,Dnajc16,Casp9,Cela2a,Cela2a,Ctrc,Efhd2,Fhad1,Tmem51,Kazn,Prdm2,Pdpn,Lrrc38,1700012P22Rik,Aadacl3,9430007A20Rik,Dhrs3,Vps13d,Tnfrsf1b,Tnfrsf8,Zfp600,Zfp600,Rex2 61
1 8.113.817 6 129740000 129800000 gain Klri2 1
1 15.569.108 6 41360000 41480000 loss Prss3,Prss3,Prss1,Prss1 4
1 2.037.683 6 63480000 63700000 loss Grid2 1
2 14.694 7 38260000 38280000 gain Pop4 1
1 14.946 7 35780000 38280000 gain Zfp507,Tshz3,Zfp536,Uri1,Ccne1,1600014C10Rik,Plekhf1,Pop4 8
1 7.192.011 7 47500000 47620000 loss Mrgpra2b,Mrgpra3 2
1 1.722.108 7 26000000 26200000 loss Cyp2b13,Cyp2b9 2
1 12.683.495 7 11350000 11680000 loss Zscan4f 1
1 1.360.954 10 80900000 81100000 gain Timm13,Lmnb2,Gadd45b,Gng7,Diras1,Slc39a3,Sgta,Thop1,Creb3l3 9
1 267.959 11 97880000 98000000 gain Fbxo47,Plxdc1,Arl5c 3
1 1.872.174 11 75860000 76420000 gain Rph3al,1700016K19Rik,Fam101b,Vps53,Glod4,Fam57a,Gemin4,Rnmtl1,Nxn,Timm22,Abr 11
1 2.811.352 12 113560000 114920000 gain Ighv14-3,Ighv13-1,Ighv13-1,Ighv13-1,Ighv13-1,Ighv13-1,Ighv6-4,Ighv6-4,Ighv6-4,Ighv6-4,Ighv6-4,Ighv6-5,Ighv6-5,Ighv6-5,Ighv6-5,Ighv6-5 16
1 1.979.667 12 115860000 115980000 loss Ighv1-83 1
1 2.098.521 12 17420000 21160000 loss Nol10,Odc1,Hpcal1,5730507C01Rik,Asap2 5
1 21.864.853 13 12580000 12650000 loss Ero1lb 1
1 3.233.185 13 61500000 62780000 loss Ctsm,Cts3,Zfp808 3
1 5.640.895 14 53540000 53780000 gain Trav12-2,Trav12-3,Trav13-2,Trav14-2,Trav15-2-dv6-2,Trav3-3,Trav9-4,Trav9-4,Trav9-4,Trav9-4,Trav4-4-dv10,Trav5-4,Trav6-7-dv9,Trav7-6,Trav7-6,Trav7-6,Trav16,Trav13-4-dv7,Trav14-3,Trav3-4 20
1 2.942.081 14 86300000 97240000 gain Diap3,Tdrd3,Rps3a2,Pcdh20,Pcdh9,Klhl1 6
1 4.662.806 14 9840000 9880000 loss Fhit 1
1 3.638.346 14 43740000 44640000 loss Ear1,Ear1,Ear10,Ear10,Ang5,Ang6,Ear2,Ear2 8
1 1.709.546 14 35320000 37400000 loss Grid1,n-R5s46,Ccser2,Rgr,Lrit1,Lrit2,Cdhr1,2610528A11Rik,Ghitm 9
2 3.387.282 14 84060000 85740000 loss Pcdh17 1
1 2.140.909 14 68280000 86300000 loss Adam7,Adamdec1,Adam28,Stc1,Nkx2-6,Nkx3-1,Slc25a37,Synb,Entpd4,SYNB,Loxl2,R3hcc1,Chmp7,Tnfrsf10b,Tnfrsf10b,Tnfrsf10b,Tnfrsf10b,Rhobtb2,Pebp4,Egr3,Bin3,Ccar2,9930012K11Rik,9930012K11Rik,Pdlim2,Sorbs3,Ppp3cc,Slc39a14,Piwil2,Polr3d,Mir320,Phyhip,Bmp1,Sftpc,Lgi3,Reep4,Hr,Nudt18,Fam160b2,Dmtn,Fgf17,Npm2,Xpo7,Dok2,Gfra2,Fndc3a,Cysltr2,Rcbtb2,Rb1,Lpar6,Itm2b,Med4,Nudt15,Sucla2,Htr2a,Esd,Lrch1,5031414D18Rik,Lrrc63,Lcp1,Cpb2,Zc3h13,Siah3,Spert,Cog3,Slc25a30,Tpt1,Snora31,Gtf2f2,Kctd4,Gpalpp1,Nufip1,Rps2-ps6,Tsc22d1,Serp2,Lacc1,Ccdc122,Enox1,n-R5s48,Dnajc15,Epsti1,Fam216b,Tnfsf11,Akap11,Dgkh,Vwa8,Zfp957,Rgcc,Naa16,Mtrf1,Kbtbd7,Zbtbd6,Wbp4,Elf1,Sugt1,Lect1,Pcdh8,Olfm4,Pcdh17 99
1 3.810.267 14 109680000 111240000 loss n-R5s50,Slitrk6 2
1 3.924.724 15 77460000 77560000 loss Apol10a,Apol10a,Apol10a,Apol10a,Apol11a,Apol11a,Apol11a,Apol11a,Apol7c 9
1 7.728.161 16 44780000 44920000 gain Cd200r1,Cd200r1,Cd200r4,Cd200r4,Cd200r2,Cd200r2 6
1 348.511 17 73500000 76640000 gain Galnt14,Ehd3,Xdh,Memo1,Dpy30,Spast,Slc30a6,Nlrc4,Yipf4,Birc6,Ttc27,Ltbp1,Rasgrp3,Fam98a 14
1 1.052.043 17 36120000 36540000 gain Rpp21,Trim39 2
1 1.325.386 17 90420000 90540000 loss Nrxn1 1
1 4.438.061 17 38300000 38360000 loss Olfr137,Olfr137 2
1 125.062 17 30380000 30920000 loss Btbd9,Glo1,Dnah8,Glp1r 4
1 2.998.359 19 13860000 13900000 gain Olfr1502 1
2 3.307.524 19 30910000 30970000 loss Prkg1 1
When i tried df2[mapply(function(x, y) length(intersect(x,y))>0, strsplit(df1$Gene, ','), strsplit(df2$Gene, ',')),]
i got out
logp chr start end CNA Genes No.of.genes
39 2.689.596 14 43932587 45325020 loss Ang5,Ang6,Ear2,Ear2,Ptgdr,Ptger2,Txndc16,Gpr137c,Ero1l 9
But i can find many rows with at least one common Gene
Pekg
is common for bothdf1
anddf2
. So, I guess that would be included. But, in the 3rd and 4th rows, in addition toSox1
, there are other genes also indf1
– akrunstrsplit
to split up theGene
column` and then do the match. Perhapsdf2[mapply(function(x,y) any(x %in% y), strsplit(df1$Gene, ','), strsplit(df2$Gene, ",")),]
– akrun