R: Finding Unique Rows in Merged Dataframe from Outer Join

Question

The iris dataset has 50 entries for each of the three species:

data('iris')
table(iris$Species)

setosa versicolor  virginica 
    50         50         50

With the iris dataset subsetted into two data frames (with overlapping species and asymmetric columns), and merged with an outer join:

# missing Petal.Width
SV <- subset(iris, Species == 'setosa' | Species == 'virginica',
            select = c('Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Species'))
# missing Sepal.Length
VV <- subset(iris, Species == 'versicolor' | Species == 'virginica',
            select = c('Sepal.Width', 'Petal.Length', 'Petal.Width', 'Species'))
SV_VV_merge <- merge(SV,VV,all=TRUE)

I find 16 extra entries for virginica:

table(SV_VV_merge$Species)

setosa versicolor  virginica 
    50         50         66

How can I see which rows in the merged dataframe have duplicates for the shared columns 'Sepal.Width' 'Petal.Length' 'Species' for the species 'virginica'?

manotheshark manotheshark · Accepted Answer · 2017-10-16T17:41:26

We can use duplicated

duplicated(SV_VV_merge)
SV_VV_merge[duplicated(SV_VV_merge), ]

Results can be confirmed by counting the unique rows

nrow(unique(SV_VV_merge))
[1] 154

Please note that you are merging different column name subsets and might not be getting the expected results.

intersect(names(VV), names(SV))
[1] "Sepal.Width"  "Petal.Length" "Species"

R: Finding Unique Rows in Merged Dataframe from Outer Join

2 Answers