2
votes

I have two dataframes which I would like to merge:

df1:

Day   name 
1     A   
1     B     
1     C     
2     A   
2     B      
2     C    
3     A      
3     B   

df2:

Day   spending
2     25
2     30
3     20
3     15
3     10

Desired outcome:

Day   name   spending
1     A      NA
1     B      NA
1     C      NA
2     A      25   
2     B      30
2     C      NA
3     A      20
3     B      15
3     NA     10 

I have tried merge & join_all, but all I got was:

Day   name   spending
1     A      NA
1     B      NA
1     C      NA
2     A      25 
2     A      30
2     B      25      
2     B      30
2     C      NA
3     A      20
3     A      15
3     A      10
3     B      20
3     B      15
3     B      10
3     NA     20
3     NA     15
3     NA     10

Note that I wish to join the two dataframes by day without having the length of rows multiplying at all, despite missing data in some columns (such as day 1 data in df2 & missing name of C in df1). Please advise

2
?You state: without having the length of rows changing at all - but your Desired outcome has 1 row more than 'df1'??dario
@NelsonGon, I have tried merge & join_all but they multiply my rows with repeating 'Day' & 'name' for every different values of 'spending' column in df2Agnes Lee
@dario, my bad let me rephrase the statementAgnes Lee

2 Answers

3
votes

That's not possible without adding a time variable.

library(dplyr) #

full_join(
  df1 %>% group_by(Day) %>% mutate(time=row_number()),
  df2 %>% group_by(Day) %>% mutate(time=row_number()), 
by=c("Day", "time"))

# A tibble: 9 x 4
# Groups:   Day [3]
    Day name   time spending
  <int> <fct> <int>    <int>
1     1 A         1       NA
2     1 B         2       NA
3     1 C         3       NA
4     2 A         1       25
5     2 B         2       30
6     2 C         3       NA
7     3 A         1       20
8     3 B         2       15
9     3 NA        3       10
2
votes

Here is a base R solution, using ave + merge + subset, i.e.,

df1$id <- ave(1:nrow(df1),df1$Day,FUN = seq_along)
df2$id <- ave(1:nrow(df2),df2$Day,FUN = seq_along)
dfout <- subset(merge(df1,df2,by = c("Day","id"),all = TRUE),select = -id)

such that

> dfout
  Day name spending
1   1    A       NA
2   1    B       NA
3   1    C       NA
4   2    A       25
5   2    B       30
6   2    C       NA
7   3    A       20
8   3    B       15
9   3 <NA>       10