I have a very large dataframe that I need join to another dataframe on two columns. I've been using merge to accomplish ir, but R runs out of memory the larger the tables get. Is there a similar solution using dplyr or plyr? I hear they require substantially less memory to accomplish. I know how to use the join function in plyr generally, what I am struggling with is joining by two columns. The merge synatx I've been using is below:
Correlation_Table <- merge(Correlation_Table, inter, by.x = c(1,2), by.y = c(1,2), all.x = TRUE, all.y = TRUE)
So for example if I have the following two dataframes:
> head(df1)
x y z a
1 1 2 429.57410 43.746670
2 2 3 717.98184 524.288886
3 3 4 601.66938 640.245469
4 4 5 87.41476 318.964765
5 5 6 586.22234 196.759991
6 6 7 619.82194 3.308136
> head(df2)
b c d
1 5 8 152.2855
2 6 9 191.5406
3 7 10 197.0520
4 8 11 175.4209
5 9 12 157.6239
6 10 13 136.3286
Where columns x and y of df1 are dimensions, while columns b and c of df2 are also dimensions and the other columns are measures. My goal here is create a new dataframe of all three measures where records of df1.x and df1.y match df2.a and df2.b.
Is this possible using plyr?
full_joinfromdplyr- akrunbyargument - akrunby(c('a'='b', 'd'='e')etc. - akrun