I have a data frame that consists of character vectors.
col1 <- c('ab', 'bc', 'cd', 'de', 'ef', 'fg', 'gh', 'hj', 'jk', 'kl', 'lm', 'mn', 'no', 'op', 'pr', 'xxx')
col2 <- c('ac', 'bd', 'ce', 'df', 'ef', 'fh', 'gj', 'hk', 'jl', 'km', 'ln', 'mo', 'np', 'or', 'ps', 'xyz')
col3 <- c('abc', 'bd', 'cde', 'def', 'efg', 'fgh', 'ghj', 'hjk', 'jkl', 'klm', 'lmm', 'mno', 'nop', 'opr', 'prs', 'aaa')
col4 <- c('abcd', 'bc', 'cdef', 'defg', 'ef', 'fghj', 'ghjk', 'hjkl', 'jklm', 'klmn', 'lmmo', 'mnop', 'nopr', 'oprs', 'prst', 'xxx')
col5 <- c('abcdd', 'bd', 'cdeff', 'defgg', 'ef', 'fghjj', 'ghjkk', 'hjkll', 'jklmm', 'klmnn', 'lmmoo', 'mnopp', 'noprr', 'oprss', 'prstt', 'aaa')
df <- cbind(col1,col2,col3,col4,col5)
What I am trying to do is finding out how many duplicates does an element have on the same row and where.
For example, the second row of the data frame has two bc & three bd elements.
Fifth row has 4 duplicate values on col1, col2, col4 & col5.
Again last row has 2*2 duplicate values; xxx on col1 & col4, aaa on col3 & col5.
The output that I would like to see is:
col6<- c(0,3,0,0,4,0,0,0,0,0,0,0,0,0,0,2)
col7<- c(0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,2)
col8<- c('NA', 'col2,col3,col5', 'NA', 'NA', 'col1,col2,col4,col5', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'col1,col4')
col9<- c('NA', 'col1,col4', 'NA', 'NA', 'col1,col2,col4,col5', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'NA', 'col3,col5')
df2 <- cbind(df,col6,col7,col8,col9)
Is there any convenient way to achieve this?