The iris dataset has 50 entries for each of the three species:
data('iris')
table(iris$Species)
setosa versicolor virginica
50 50 50
With the iris dataset subsetted into two data frames (with overlapping species and asymmetric columns), and merged with an outer join:
# missing Petal.Width
SV <- subset(iris, Species == 'setosa' | Species == 'virginica',
select = c('Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Species'))
# missing Sepal.Length
VV <- subset(iris, Species == 'versicolor' | Species == 'virginica',
select = c('Sepal.Width', 'Petal.Length', 'Petal.Width', 'Species'))
SV_VV_merge <- merge(SV,VV,all=TRUE)
I find 16 extra entries for virginica
:
table(SV_VV_merge$Species)
setosa versicolor virginica
50 50 66
How can I see which rows in the merged dataframe have duplicates for the shared columns 'Sepal.Width' 'Petal.Length' 'Species' for the species 'virginica'?