0
votes

How can one merge two data frames, one column-wise and other one row-wise? For example, I have two data frames like this:

A:    add1 add2 add3 add4  
  1   k    NA   NA   NA 
  2   l    k    NA   NA 
  3   j    NA   NA   NA 
  4   j    l    NA   NA

B:    age  size name  
  1   5    6    x   
  2   8    2    y  
  3   1    3    x
  4   5    4    z 

I want to merge the two data.frames by row.name. However, I want to merge the data.frame A column-wise, instead of row-wise. So, I'm looking for a data.frame like this for result:

C:id  age  size name add  
  1   5    6    x   k
  2   8    2    y   l
  2   8    2    y   k
  3   1    3    x   j
  4   5    4    z   j
  4   5    4    z   l

For example, suppose you have information of people in table B including name, size, etc. These information are unique values, so you have one row per person in B. Then, suppose that in table A, you have up to 5 past addresses of people. First column is the most recent address; second, is the second most recent address; etc. Now, if someone has less than 5 addresses (e.g. 3), you have NA in the 4 and 5 columns for that person.

What I want to achieve is one data frame (C) that includes all of this information together. So, for a person with two addresses, I'll need two rows in table C, repeating the unique values and only different in the column address.

I was thinking of repeat the rows of A data frame by the number of non-NA values while keeping the row.names the same as they were (like data frame D) and then merge the the new data frame with B. But I'm not sure how to do this.

D:    address   
  1   k    
  2   l    
  2   k
  3   j
  4   j
  4   l

Thank you!

1
It's not at all clear (to me at least) what you want to do exactly, and the example you've provided just has me scratching my head further. You seem to be dropping some rows and columns, and renaming others. Can you try to clarify exactly what you want to do?Joe
Thank you Joe and sorry for not being clear. I want to repeat the columns of table A as rows (like in D). But I want to repeat only the columns that have a value in them. After that, I want to merge/join B with A (or D). I'm going to update the question with an example to make it more clear. Thank you!danseuse

1 Answers

2
votes

Change the first data.frame to long format, then it's easy. df1 is A and df2 is B. I also name the numbers id.

require(tidyr)

# wide to long (your example D)
df1tidy <- gather(df1,addname,addval,-id)

# don't need the original add* vars or NA's
df1tidy$addname <- NULL
df1tidy <- df1tidy[!is.na(df1tidy$addval), ]

# merge them into the second data.frame
merge(df2,df1tidy,by = 'id',all.x = T)