0
votes

I have two large data.frames:

   DF1      
   AB2        CF34      FGH23     P53T    
   a           b         c         d          
   e           bv        sd        we 
   sa          s         qw        fd    
   fg          df        lk        po
   DF2      
  AB2        CF34      FGH23     P53T    
   a          b         c         m          
   n          m         sd        we 
  sa          s         py        fd    
  fgq         df        lk        pq      

I "simlpy" would like to match the two data.frames column by column each two columns (according to the corresponding column name) and return the number of matched items resulting from the pairwise comparison. In other words a sort of:

merge(DF1, DF2, by = "AB2")
merge(DF1, DF2, by = "CF34")

and so on. The problem is that the two files are too large to be able to do this comparison manually as I reported using the merge function.

Any idea about?

Thanks a lot!

E.

1
do you mean sapply(names(DF1),function(n) nrow(merge(DF1,DF2,by=n)) ?Ben Bolker
Yeah! Fantastic! It works! Thanks a lot!Elb

1 Answers

2
votes

(Upgraded from a comment.)

It sounds like

sapply(names(DF1),function(n) nrow(merge(DF1,DF2,by=n))

solves your problem.