I'm currently working with R. I have a data frame with three names, one per column: year1
, year2
and year3
. Each column has a set of numeric data.
I want to have a resulting data frame which includes the data that is repeated in two different columns, that is: if num.4
is repeated in year1
and year2
the new data frame has num.4
, in the same way, if num.5
is repeated in year2
and year3
the new data frame has num.5
included.
I tried the following code:
newdf1 <- origdf[origdf$year1 == origdf$year2 | origdf$year1 == origdf$year3, c(1)]
newdf2 <- origdf[origdf$year2 == origdf$year3, c(2)]
and then I merged both data frames, but not all the data was included, it contained many NA
values.
Then I tried the following code:
newdf <- origdf[origdf$year1 == origdf$year2 | origdf$year1 == origdf$year3 & origdf$year2 == origdf$year3, c(1, 2)]
But it also didn't work, it gave me a resulting data frame with many NA values and some correct values, but not all of the repeated numbers were included.
How can I effectively have a data frame that includes values that are repeated in exactly two of the three different columns of the original data frame, without repeated values (I don't want to have a number that is repeated in all the three columns of the original data frame)?
The expected outcome would be:
>newdf
1 num.4
2 num.5
dput(x)
wherex
is the input. – G. Grothendieck