2
votes

I have two dataframes:

df1: My main dataset with an address column

df2: A database containing latitudes and longitudes plus and address column

I want to merge two columns over from df2 to my df1.

df1:

ID    VAR1   VAR2   VARX      Address
 1     7      2       x     Road 1, 1234 City
 2     8      0       y     Road 4, 1234 City
 3     6      2       x     Road 5, 1234 City
 4     7      2       x     Road 6, 1234 City
 5     4      1       y     Road 10, 1234 City
 6     1      2       x     Road 11, 1234 City

df2:

    Address            Latitude      Longitude
Road 1, 1234 City        12,67          56,78
Road 2, 1234 City        12,66          55,67
Road 3, 1234 City        12,45          55,10
Road 4, 1234 City        12,10          55,20
Road 5, 1234 City        11,50          55,30
Road 6, 1234 City        12,34          55,32
Road 7, 1234 City        12,89          55,40
Road 8, 1234 City        12,77          55,45
Road 9, 1234 City        11,67          55,67
Road 10, 1234 City       11,90          55,78
Road 11, 1234 City       11,12          56,59

So my new dataframe would look like this:

new dataframe, df3:

ID    VAR1   VAR2   VARX      Address            Latitude   Longitude
 1     7      2       x     Road 1, 1234 City     12,67       56,78
 2     8      0       y     Road 4, 1234 City     12,10       55,20
 3     6      2       x     Road 5, 1234 City     11,50       55,30
 4     7      2       x     Road 6, 1234 City     12,34       55,32
 5     4      1       y     Road 10, 1234 City    11,90       55,78
 6     1      2       x     Road 11, 1234 City    11,12       56,59

I've tried a left_join, but it only returns NA.

df3 <- left_join(df1, df2, by = c("Address"))

EDIT: SOLVED Apprently I had some wrong spaces in one of my address columns. The code above does work.

2
More information if needed: - df1 contains rows on the address column that won't match those in df2 - df2 contains a lot of rows on the address column that won't match those in df1 What I want is really how to do the Excel Vlookup.Louise Sørensen
could you mabe supply your data using dput(df1) and dput(df2)?brettljausn
I still get the sameLouise Sørensen

2 Answers

1
votes

left_join should work fine. Have a look to this and check your data structure.

df3 <- dplyr::left_join(df1, df2, by = "Address")

Output

  ID VAR1 VAR2 VARX            Address Latitude Longitude
1  1    7    2    x  Road 1, 1234 City    12,67     56,78
2  2    8    0    y  Road 4, 1234 City    12,10     55,20
3  3    6    2    x  Road 5, 1234 City    11,50     55,30
4  4    7    2    x  Road 6, 1234 City    12,34     55,32
5  5    4    1    y Road 10, 1234 City    11,90     55,78
6  6    1    2    x Road 11, 1234 City    11,12     56,59

Data

df1

structure(list(ID = 1:6, VAR1 = c(7L, 8L, 6L, 7L, 4L, 1L), VAR2 = c(2L, 
0L, 2L, 2L, 1L, 2L), VARX = structure(c(1L, 2L, 1L, 1L, 2L, 1L
), .Label = c("x", "y"), class = "factor"), Address = structure(c(1L, 
4L, 5L, 6L, 2L, 3L), .Label = c("Road 1, 1234 City", "Road 10, 1234 City", 
"Road 11, 1234 City", "Road 4, 1234 City", "Road 5, 1234 City", 
"Road 6, 1234 City"), class = "factor")), .Names = c("ID", "VAR1", 
"VAR2", "VARX", "Address"), class = "data.frame", row.names = c(NA, 
-6L))

df2

structure(list(Address = structure(c(1L, 4L, 5L, 6L, 7L, 8L, 
9L, 10L, 11L, 2L, 3L), .Label = c("Road 1, 1234 City", "Road 10, 1234 City", 
"Road 11, 1234 City", "Road 2, 1234 City", "Road 3, 1234 City", 
"Road 4, 1234 City", "Road 5, 1234 City", "Road 6, 1234 City", 
"Road 7, 1234 City", "Road 8, 1234 City", "Road 9, 1234 City"
), class = "factor"), Latitude = structure(c(9L, 8L, 7L, 5L, 
2L, 6L, 11L, 10L, 3L, 4L, 1L), .Label = c("11,12", "11,50", "11,67", 
"11,90", "12,10", "12,34", "12,45", "12,66", "12,67", "12,77", 
"12,89"), class = "factor"), Longitude = structure(c(10L, 7L, 
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), .Label = c("55,10", "55,20", 
"55,30", "55,32", "55,40", "55,45", "55,67", "55,78", "56,59", 
"56,78"), class = "factor")), .Names = c("Address", "Latitude", 
"Longitude"), class = "data.frame", row.names = c(NA, -11L))
1
votes

Base R Function

merge(df1,df2,by = "Address")

output

      Address        ID  VAR1 VAR2 VARX Latitude Longitude
1  Road 1, 1234 City  1    7    2    x    12,67     56,78
2 Road 10, 1234 City  5    4    1    y    11,90     55,78
3 Road 11, 1234 City  6    1    2    x    11,12     56,59
4  Road 4, 1234 City  2    8    0    y    12,10     55,20
5  Road 5, 1234 City  3    6    2    x    11,50     55,30
6  Road 6, 1234 City  4    7    2    x    12,34     55,32