0
votes

I have a large dataset in which i wanted to perform a VLOOKUP function on in Excel. The problem is that there are too many rows and Excel can't handle the dimensions. Even subsets are too long for Excel. This is why i want to use R to perform the same function.

What i have: A very large dataset and a short 'Lookup-dataset.'

The large dataset has a column (Lookup) that has the same unique identifier as found in the Lookup-dataset.

Example of large dataset (the original dataset has 14 columns with other relevant data that is incorporated in the Lookup-column, but here i tried to make it simple):

Species    Site   Present    Lookup
A           A1    1          Aa1 
A           A2    0          Ab2
A           A3    1          Aa3
A           A4    1          Aa4
A           A4.2  1          Aa4
B           B1    0          Bb1
B           B2    0          Bb2
B           B3    0          Bb3
B           B4    1          Bb4
B           B1.1  1          Bb1
B           B2.1  0          Bb2

Example of lookup table:

Lookup  Val
Aa1     12 
Ab2     15
Aa3     18
Aa4     101
Bb1     60
Bb2     75
Bb3     89
Bb4     3

Since there are more columns in the dataset compared to the lookup-dataset, i can't get the dplyr::full_join function to do the job.

In Excel i would use the VLOOKUP function and fill down so all cells in the new column have the proper value.

My question: How can i achieve in R that my dataset has a new column, containing the Val from the lookup-dataset?

2
Assumig the lookup table has only unique value, dplyr::left_join should get the desired result (left_joind means all values from the first (left) argument (i.e. large_dataset) and only matching rows from the right i.e. lookup_dataset.dario

2 Answers

2
votes

No need to use dplyr, simple base-R is enough, with the match-function.

largedataset$Val <- lookuptable$Val[match(largedataset$Lookup, lookuptable$Lookup)]

If you need to do the lookup more often and have a really large lookuptable, there may be some benefit in using the fastmatch-package with the similar fmatch-function, but only if regular match is too slow.

1
votes

If you prefer to use dplyr package, use left_join function:

library(dplyr)
bigDF %>% left_join(lookupDF)

# Joining, by = "Lookup"
# A tibble: 11 x 5
#    Species Site  Present Lookup   Val
#    <chr>   <chr>   <dbl> <chr>  <dbl>
#  1 A       A1          1 Aa1       12
#  2 A       A2          0 Ab2       15
#  3 A       A3          1 Aa3       18
#  4 A       A4          1 Aa4      101
#  5 A       A4.2        1 Aa4      101
#  6 B       B1          0 Bb1       60
#  7 B       B2          0 Bb2       75
#  8 B       B3          0 Bb3       89
#  9 B       B4          1 Bb4        3
# 10 B       B1.1        1 Bb1       60
# 11 B       B2.1        0 Bb2       75

Or using base R

merge(bigDF, lookupDF)

Note. merge does not keep the row order

data


bigDF <- tribble(
            ~Species,    ~Site,   ~Present,    ~Lookup,
            "A",    "A1",   1,  "Aa1",
            "A",    "A2",   0,  "Ab2",
            "A",    "A3",   1,  "Aa3",
            "A",    "A4",   1,  "Aa4",
            "A",    "A4.2", 1,  "Aa4",
            "B",    "B1",   0,  "Bb1",
            "B",    "B2",   0,  "Bb2",
            "B",    "B3",   0,  "Bb3",
            "B",    "B4",   1,  "Bb4",
            "B",    "B1.1", 1,  "Bb1",
            "B",    "B2.1", 0,  "Bb2")


lookupDF <- tribble(
            ~Lookup,  ~Val,
            "Aa1",      12, 
            "Ab2",      15,
            "Aa3",      18,
            "Aa4",      101,
            "Bb1",      60,
            "Bb2",      75,
            "Bb3",      89,
            "Bb4",      3)