1
votes

So I have two data frames, info and towers, with examples in the following:

Info:

ID             Date
1132           01/09/2015
1156           02/09/2015
1132           04/09/2015
1101           04/09/2015

Towers:

Tower   ID1   ID2
    1   1132  1101
    2   1520  1156

The values in the ID column of Info will always match either ID1 or ID2 in Towers. I want to join the frames based on that information, so my joined frame should be:

ID             Date         Tower
1132           01/09/2015       1
1156           02/09/2015       2
1132           04/09/2015       1
1101           04/09/2015       2

I know dplyr's semi_join makes something like what I need, but I understand it requires a match in both value and column name. Given that these columns have different names, I don't know if it will work properly. Is there a method I could use here?

4
melt Towers so your IDs are in the same columnSymbolixAU
You should probably look at as.Date and learn to format those properly. Also, please make your example reproducible next time so it can be copy-pasted in by others.Frank
@Frank Yes I already worked in the format. For learning purposes, what do you mean exactly as a reproducible example?Rono
I'm referring to the extra stuff in Sumedh's answer below, that looks like structure(...) If you copy-paste that into your R session, it will return your example data.frame. This sort of thing should be included with a question. For info on how to go about this, check out stackoverflow.com/questions/5963269/…Frank

4 Answers

3
votes
library(dplyr)

tidyr::gather(df2, Tower2, ID, -Tower) %>% select(-Tower2) %>% right_join(df, "ID")

df

structure(list(ID = c(1132, 1156, 1132, 1101), Date = structure(c(1L, 
2L, 3L, 3L), .Label = c("01/09/2015", "02/09/2015", "04/09/2015"
), class = "factor")), .Names = c("ID", "Date"), row.names = c(NA, 
-4L), class = "data.frame")

df2

structure(list(Tower = 1:2, ID1 = c(1132L, 1520L), ID2 = c(1101L, 
1156L)), .Names = c("Tower", "ID1", "ID2"), class = "data.frame", row.names = c(NA, 
-2L))
2
votes

We can use melt from data.table. Convert the 'data.frame' to 'data.table' (setDT(df2)), melt from 'wide' to 'long' format and join with the original dataset 'df' on 'ID'.

library(data.table)
melt(setDT(df2), id.var="Tower", value.name = "ID")[df, on = "ID"][, variable := NULL][]
#  Tower   ID       Date
#1:     1 1132 01/09/2015
#2:     2 1156 02/09/2015
#3:     1 1132 04/09/2015
#4:     1 1101 04/09/2015

We could also do this without any join and using only base R (no external packages, without any loops (sapply is a loop in disguise)). Here, the idea is to replicate the second dataset 'Tower' column by the number of columns except the 'Tower' i.e. 2, set the names of that vector by unlisting the columns of 'df2' except the 'Tower' (unlist(df2[-1])) and use that to match the 'ID' column in the first dataset (as.character(df$ID)) to return the 'Tower' that corresponds to the 'ID'.

df$Tower <- setNames( rep(df2$Tower, 2), unlist(df2[-1]))[as.character(df$ID)]
df$Tower
#[1] 1 2 1 1
1
votes

You really don't actually need to join; you can just make a new column, as long as you evaluate grouped by row:

Info %>% rowwise() %>% 
    mutate(Tower = Towers[ID == Towers$ID1 | ID == Towers$ID2, 'Tower'])
## Source: local data frame [4 x 3]
## Groups: <by row>
## 
## # A tibble: 4 x 3
##      ID       Date Tower
##   <int>     <fctr> <int>
## 1  1132 01/09/2015     1
## 2  1156 02/09/2015     2
## 3  1132 04/09/2015     1
## 4  1101 04/09/2015     1

or equivalently in full base R,

Info$Tower <- sapply(Info$ID, function(x){Towers[x == Towers$ID1 | x == Towers$ID2, 'Tower']})
1
votes

Another approach using melt(also suggested by @SymbolixAU in comment) from reshape2 package and using df & df2 of @Sumedh's post.

library(reshape2)
library(dplyr)
melt(df2,value.name = "ID",id.vars = "Tower") %>% right_join(df,by = "ID") %>% select(-variable)

We can also do this by using base R reshape function as this:

reshape(data = df2,direction = "long",varying = c("ID1","ID2"),v.names = "ID") %>% right_join(df,by = "ID") %>% select(-c(time,id))