I have a large dataset in which i wanted to perform a VLOOKUP function on in Excel. The problem is that there are too many rows and Excel can't handle the dimensions. Even subsets are too long for Excel. This is why i want to use R to perform the same function.
What i have: A very large dataset and a short 'Lookup-dataset.'
The large dataset has a column (Lookup) that has the same unique identifier as found in the Lookup-dataset.
Example of large dataset (the original dataset has 14 columns with other relevant data that is incorporated in the Lookup-column, but here i tried to make it simple):
Species Site Present Lookup
A A1 1 Aa1
A A2 0 Ab2
A A3 1 Aa3
A A4 1 Aa4
A A4.2 1 Aa4
B B1 0 Bb1
B B2 0 Bb2
B B3 0 Bb3
B B4 1 Bb4
B B1.1 1 Bb1
B B2.1 0 Bb2
Example of lookup table:
Lookup Val
Aa1 12
Ab2 15
Aa3 18
Aa4 101
Bb1 60
Bb2 75
Bb3 89
Bb4 3
Since there are more columns in the dataset compared to the lookup-dataset, i can't get the dplyr::full_join
function to do the job.
In Excel i would use the VLOOKUP
function and fill down so all cells in the new column have the proper value.
My question: How can i achieve in R that my dataset has a new column, containing the Val from the lookup-dataset?
dplyr::left_join
should get the desired result (left_joind means all values from the first (left) argument (i.e. large_dataset) and only matching rows from the right i.e. lookup_dataset. – dario