I am trying to join two dataframes. However unlike the normal join I want to match a series of columns from the first with the second.
Basically I have a site list with a reference to the nearest surrounding sites. I need to look up the nearest sites full Gauge and LTA IDs which I have in a separate dataframe. I have provided some example dataframes, including an example output, but the real thing isn't nearly this neat (And has a lot more columns and rows) which is why I need to look up the Surrogate gauges in TestRefList, instead of creating in the method below.
library(plyr)
library(tidyverse)
TestRefList <- data.frame(Site = paste0("sl",1:10,".1"), Gauge = paste0(1:10,".1","/110.00/1"), LTA = paste0(1:10,".1","/110.99/1"), stringsAsFactors = F)
Surrogates <- data.frame(Primary = paste0("sl",c(2,4,6),".1"), nearest1=paste0("sl",1:3,".1"), nearest2=paste0("sl",7:9,".1"), stringsAsFactors = F)
HopefulOutput <- data.frame(Primary = paste0("sl",c(2,4,6),".1"), nearest1=paste0("sl",1:3,".1"), nearest2=paste0("sl",7:9,".1"),
nearest1Gauge = paste0(1:3,".1","/110.00/1"), nearest1LTA = paste0(1:3,".1","/110.99/1"),
nearest2Gauge = paste0(7:9,".1","/110.00/1"), nearest2LTA = paste0(7:9,".1","/110.99/1"), stringsAsFactors = F)
I thought I could use some combination of plyr::ldply and dplyr::left_join such as: Out <- ldply(names(Surrogates)[2:3], function(x) left_join(Surrogates,TestRefList, by = c(paste0(x, '="Site"'))))
however I can't get the joining to working using a name from the list. I've tried outside the list, with a few arrangements of " and ' around the equals eg:
left_join(Surrogates,TestRefList, by = c(paste0('"',names(Surrogates)[2],'"' , '="Site"')))
Even if I could get this part to work, I'm not sure how it'll work within the ldply.
Any ideas? I'm happy to come at this an entirely different way if necessary although I'm much more comfortable with data.frames and tidyverse than data.table options