Search value through multiple columns, return id of row

Question

Suppose I have two data frames:

A = data frame consisting of unique phone numbers with an additional factor column. Suppose nrow(A) = 20

B = data frame consisting of rows representing unique households and four columns for listed phone numbers, and fifth column for unique household ID. It is possible that the same number is repeated in multiple of the B columns. Suppose nrow(B) = 100

I want to return a table that has the the "A" unique phone numbers with the household ID from "B" after checking whether the A phone numbers are in either of the four columns.

So for example:

a <- data.frame(phone=c("12345","12346","12456"),
                factor=c("OK","BAD","BAD"))
b <- data.frame(ph1 = c("12345","","12346","12347",""), 
                ph2 = c("","","12346","","12348"), 
                ph3 = c("","","","12456","67890"), 
                hhid = seq(1121,1125))

How can I return C that would look like the following:

c <- data.frame(phone = c("12345","12346","12456"),
                factor = c("OK","BAD","BAD"), 
                hhid = c("1121","1123","1124"))

I'm sure it's possible to do this in a really elegant way or with minimal amount of code. I thought about using for loops or merge but think that is on the wrong track. Open to using whichever packages.

Update - I received a bunch of different suggestions using different packages. This helps me learn about the different packages but also what base can do. My need is filled - but please, do feel free to share anything else that might good to know regarding this question. — Sehj Kashyap

AntoniosK AntoniosK · Accepted Answer · 2017-08-17T07:35:37

library(dplyr)
library(tidyr)

a <- data.frame(phone=c("12345","12346","12456"),
                factor=c("OK","BAD","BAD"))
b <- data.frame(ph1 = c("12345","","12346","12347",""), 
                ph2 = c("","","12346","","12348"), 
                ph3 = c("","","","12456","67890"), 
                hhid = seq(1121,1125))

# reshape data and keep unique combinations
b2 = b %>% 
  gather(ph, phone, -hhid) %>% 
  select(-ph) %>% 
  distinct()

# join data frames
left_join(a, b2, by = "phone")

#   phone factor hhid
# 1 12345     OK 1121
# 2 12346    BAD 1123
# 3 12456    BAD 1124

Search value through multiple columns, return id of row

3 Answers