1
votes

I have a data frame like

col1 col2  col3
A      2    b1 
A      3    b2
A      2    b2
A      2    b1
A      3    b2

I want to get the count of unique values of col3 for each combination of col1 and col2 as following

col1  col2 count_unique
 A       2         2
 A       3         1

What is the best one line solution to this?

2
aggregate(col3~., df, function(x) length(unique(x)) ) but your desired output is wrong -- there's only one unique value for for col3 for the second row.Frank
@Frank sorry about that. corrected it. and thanks for the solution.user3664020
@maj Sorry, I actually don't know at all; and just got there by trial and error looking at the examples at the bottom of the ?aggregate documentation. I always use data.table for this: setDT(df)[, uniqueN(col3), by=.(col1,col2)]Frank
[I just deleted my previous comment when I found an explanation for the dot in formulas. My comment read along the lines of "Would you please explain the formula you used?"] @Frank: Thanks.maj
@akrun I have no problem with someone taking these variations and putting them into an answer. The q is probably a dupe, though, so I'm not putting up that effort myself.Frank

2 Answers

3
votes

As @Frank and @akrun pointed out in their comments, there are several possible solutions to your question - here are three of the most used ones:

in base R:

aggregate(col3~., df, function(x) length(unique(x)) )

using the data.table package (v1.9.5 and higher):

setDT(df)[, uniqueN(col3), by=.(col1,col2)]

using the dplyr package:

df %>% group_by(col1, col2) %>% summarise(col3=n_distinct(col3))
1
votes

Other two options:

plyr

library(plyr)
count(unique(df), vars = c("col1", "col2"))

Output:

 col1 col2 freq
1    A    2    2
2    A    3    1

sqldf

library(sqldf)
sqldf("SELECT col1, col2, COUNT(DISTINCT(col3)) n 
      FROM df GROUP BY col1, col2")

Output:

  col1 col2 n
1    A    2 2
2    A    3 1