2
votes

basically I have 2 data frames with this basic structure:

Col1 Col2 Col3 Col4
aaa 12 xxx 32b
tfe 21 xxx 14f
eed 12 xxx 54b
wes 95 xxx 54r
rtf 44 xxx 99q
fvg 87 xxx 55h

So some of the values in column 1 are repeated in both data frames and I wanted to make a new data frame with only the rows with unique values in Col1 for each dataframe, so in psuedo-code:

Unique_to_df1 <- df1[Unique_Function(find rows with Col1 values not in df2)] Unique_to_df2 <- df2[Unique_Function(find rows with Col1 values not in df1)]

I have tried a few functions, but they all either seem to only export the unique values of Col1, not the whole table or only seem to work with removing.extracting duplicate values from one data frame, not comparing between two. Any help/suggestions would be greatly appreciated!

2
Have a look at setdiffGKi
Like an inner join?camille

2 Answers

1
votes

anti_join(x, y) drops all observations in x that have a match in y.

df1 <- data.frame(
  stringsAsFactors = FALSE,
              Col1 = c("aaa", "tfe", "eed", "wes", "rtf", "fvg"),
              Col2 = c(12L, 21L, 12L, 95L, 44L, 87L),
              Col3 = c("xxx", "xxx", "xxx", "xxx", "xxx", "xxx"),
              Col4 = c("32b", "14f", "54b", "54r", "99q", "55h")
)


df2 <- data.frame(
  stringsAsFactors = FALSE,
              Col1 = c("a", "tfe", "ee", "ws", "rt", "fvg"),
              Col2 = c(12L, 21L, 12L, 95L, 44L, 87L),
              Col3 = c("xxx", "xxx", "xxx", "xxx", "xxx", "xxx"),
              Col4 = c("32b", "14f", "54b", "54r", "99q", "55h")
)

library(dplyr)

unique1 <- df1 %>% 
  anti_join(df2)

unique2 <- df2 %>% 
  anti_join(df1)


# Join them together:

rbind(unique1, unique2)

#>   Col1 Col2 Col3 Col4
#> 1  aaa   12  xxx  32b
#> 2  eed   12  xxx  54b
#> 3  wes   95  xxx  54r
#> 4  rtf   44  xxx  99q
#> 5    a   12  xxx  32b
#> 6   ee   12  xxx  54b
#> 7   ws   95  xxx  54r
#> 8   rt   44  xxx  99q

Created on 2021-03-16 by the reprex package (v0.3.0)

1
votes

You can use setdiff to select rows with unique values in Col1 for each dataframe.

x[x$Col1 %in% setdiff(x$Col1, y$Col1),]
#x[!x$Col1 %in% intersect(x$Col1, y$Col1),] #Alternative
#  Col1 Col2 Col3 Col4
#1  aaa   12  xxx  32b

y[y$Col1 %in% setdiff(y$Col1, x$Col1),]
#  Col1 Col2 Col3 Col4
#1  bbb   12  xxx  32b

Data:

x <- read.table(header=TRUE, text="Col1     Col2    Col3    Col4
aaa     12  xxx     32b
tfe     21  xxx     14f
eed     12  xxx     54b
wes     95  xxx     54r
rtf     44  xxx     99q
fvg     87  xxx     55h")
y <- x
y[1,1] <- "bbb"