count unique values in each row data table

Question

I have a data table like

    col1     col2     col3    col4
1:    a        a        a       2
2:    b        b        a       4.4
3:    w        w        s       6.3

I want to get something like below without running a for loop.

    col1     col2     col3    col4   count
1:     a        a        a        2      1
2:     b        b        a        4.4    2
3:     w        w        s        6.3    2

I am counting unique values of col1, col2, col3 in each row and storing in count column. How do I do this in 1 line?

How about applying length(unique()) to a transposed version of the matrix minus col4? — ulfelder
this is one round about way of doing it, but then it involves converting data table to a matrix and then converting back to a data table. I am sure there must be some way of doing it as data table only. But not able to figure out that. — user3664020

LyzandeR LyzandeR · Accepted Answer · 2015-09-01T13:26:44

This might be of use:

data.frame solution

df <- read.table(header=T, text='   col1     col2     col3    col4
1    a        a        a       2
2    b        b        a       4.4
3    w        w        s       6.3')

#one line using apply    
df$count <- apply(df[1:3], 1, function(x) {length(unique(x))})

Output:

> df
  col1 col2 col3 col4 count
1    a    a    a  2.0     1
2    b    b    a  4.4     2
3    w    w    s  6.3     2

data.table solution

It will be a bit trickier to use the data.table syntax in this example.

First I create an id column by which to group_by:

#convert original df to data.table
df2 <- as.data.table(df)
df2[, id := 1:nrow(df2) ]

and then I use my self-made luna function to calculate the length of the unique elements:

luna <- function(x)   length(unique(unlist(strsplit(x,''))))
df2[, count := luna(paste0(col1, col2, col3)), by=id ]

Output:

> df2
   col1 col2 col3 col4 id count
1:    a    a    a  2.0  1     1
2:    b    b    a  4.4  2     2
3:    w    w    s  6.3  3     2

Or as @Tensibai mentions in the comments, this is much faster:

df2 <- as.data.table(df)
df2[, id := 1:nrow(df2) ]
luna <- function(x)   length(unique(x))

df2[, count2 := luna(c(col1, col2, col3)), by=id ]

> df2
   col1 col2 col3 col4 id count2
1:    a    a    a  2.0  1      1
2:    b    b    a  4.4  2      2
3:    w    w    s  6.3  3      2

And if we combine @Frank's and @Tensibai 's comments this should be the fastest (data.table 1.9.5+):

df2 <- as.data.table(df)
df2[, id := 1:nrow(df2) ]

#not run
#works only in data.table >= 1.9.5
df2[, count2 := uniqueN(c(col1, col2, col3)), by=id ]
#not run

count unique values in each row data table

2 Answers

data.frame solution

data.table solution