Identify unique raws for a data.frame that is grouped by multiple variables

Question

everyone,

I have been trying to get this to work. Basically, I have a data.frame like the following:

C1   C2   C3   C4   
a     aa  aaa  aaaa
a     bb  aaa  bbbb
b     aa  aaa  aaaa
b     aa  aaa  aaaa
b     bb  aaa  aaaa

What I want for output is something like this:

C1    C2   C3   C4
a     aa   aaa  aaaa
a     bb   aaa  bbbb
b     aa   aaa  aaaa
b     bb   aaa  aaaa

Basically, I want the data frame first to be 'grouped' by 'C1', 'C2 and 'C3', and for each subgroup, I'd like to gather some summary (It's like the idea of dplyr package). In this case, I'd like to identify the unique 'C4' in each sub group.

I tried dplyr package but it doesn't seem to work:

dataMat1 <- group_by(dataMat, C1, C2, C3)
dataMat2 <- summerise(dataMat1, unique(C4))

dataMat2 only contains only column. How can I get the output I want by using dplyr or any other packages? Right now, I wrote several for loops to get the desired output.

Thanks!

No need in dplyr. Just unique(dataMat) will give you what you need. — David Arenburg
Right, dplyr will come in handy if I have more columns than just these 4 — yuanhangliu1
unique in data.table have the by argument. so, you can use unique(df1, by=c('C1', 'C2', 'C3')) — akrun

jalapic jalapic · Accepted Answer · 2015-06-07T22:44:01

You could just use unique in this instance:

df %>% group_by(C1,C2,C3) %>% unique

#  C1 C2  C3   C4
#1  a aa aaa aaaa
#2  a bb aaa bbbb
#3  b aa aaa aaaa
#4  b bb aaa aaaa

Identify unique raws for a data.frame that is grouped by multiple variables

1 Answers