Consider 2 dfs:
df1:
Col A | Col B | Col C |
---|---|---|
Tom | Ice Cream | 0.2 |
Tom | Candy | 0.4 |
Tom | Lollipop | 0.6 |
Bob | Sweets | 0.1 |
Abe | Sweets | 0.1 |
df2:
Col A | Col D | Col E |
---|---|---|
Tom | Sweets | 0.5 |
Tom | Chocolate | 0.2 |
Tom | Cola | 0.3 |
If I use full_join
, right_join
, or merge
with by = "Col A"
(with all.y=TRUE
), I get each "permutation" as its own row:
df3:
Col A | Col B | Col C | Col D | Col E |
---|---|---|---|---|
Tom | Ice Cream | 0.2 | Sweets | 0.5 |
Tom | Ice Cream | 0.2 | Chocolate | 0.2 |
Tom | Ice Cream | 0.2 | Cola | 0.3 |
Tom | Candy | 0.4 | Sweets | 0.5 |
Tom | Candy | 0.4 | Chocolate | 0.2 |
Tom | Candy | 0.4 | Cola | 0.3 |
Tom | Lollipop | 0.6 | Sweets | 0.5 |
Tom | Lollipop | 0.6 | Chocolate | 0.2 |
Tom | Lollipop | 0.6 | Cola | 0.3 |
Bob | Sweets | 0.1 | N/A | N/A |
Abe | Sweets | 0.1 | N/A | N/A |
If I use inner_join
, left_join
, or merge
with by = "Col A"
(without all.y=TRUE
), I get the same as above except subsetted to only the rows manipulated, for example, rows with Abe and Bob would be removed:
df4:
Col A | Col B | Col C | Col D | Col E |
---|---|---|---|---|
Tom | Ice Cream | 0.2 | Sweets | 0.5 |
Tom | Ice Cream | 0.2 | Chocolate | 0.2 |
Tom | Ice Cream | 0.2 | Cola | 0.3 |
Tom | Candy | 0.4 | Sweets | 0.5 |
Tom | Candy | 0.4 | Chocolate | 0.2 |
Tom | Candy | 0.4 | Cola | 0.3 |
Tom | Lollipop | 0.6 | Sweets | 0.5 |
Tom | Lollipop | 0.6 | Chocolate | 0.2 |
Tom | Lollipop | 0.6 | Cola | 0.3 |
I'm trying to do a merge without increasing the number of rows, because the unique permutations don't matter. It doesn't matter what order they get joined, either. Intended output:
df5:
Col A | Col B | Col C | Col D | Col E |
---|---|---|---|---|
Tom | Ice Cream | 0.2 | Sweets | 0.5 |
Tom | Candy | 0.4 | Chocolate | 0.2 |
Tom | Lollipop | 0.6 | Cola | 0.3 |
Bob | Sweets | 0.1 | N/A | N/A |
Abe | Sweets | 0.1 | N/A | N/A |
Filtering df3 by unique
doesn't help because the permutations are indeed unique.
In my scenario, I am 100% sure that every subsequent df to join to df1 has exactly 3 rows, which matches with the original number of rows with "Tom" present in df1$'Col A'
.
In my real scenario, this join/merge would be deployed in a loop, where each entry in Col A
has exactly 3 occurrences as well. So in later loops, it will do the same for "Abe" and "Bob".
bind_cols()
isn't versatile enough because the columns may have already been created in a previous loop (i.e. Col D
and Col E
would be already created by Tom's loop, with N/A
s for Abe and Bob's rows). In this scenario, I only want the values to be imputed, not create new columns (which would further conflict by having the same colnames).
Is there a script/function/package to do what I'm intending? Converting to tibbles is fine as well. Maybe a some conditions to combine a _join
/merge
with a bind_cols
?