
dplyr is fast and I would like to use the %.% piping a lot. I want to use a table function (count by frequency) and preserve column name and have output be data.frame.

How can I achieve the same as the code below using only dplyr functions (imagine huge data.table (BIGiris) with 6M rows)

> out<-as.data.frame(table(iris$Species))
> names(out)[1]<-'Species'
> names(out)[2]<-'my_cnt1'
> out

output is this. Notice that I have to rename back column 1. Also, in dplyr mutate or other call - I would like to specify name for my new count column somehow.

     Species my_cnt1
1     setosa      50
2 versicolor      50
3  virginica      50

imagine joining to a table like this (assume iris data.frame has 6M rows) and species is more like "species_ID"

> habitat<-data.frame(Species=c('setosa'),lives_in='sea')

final join and output (for joining, I need to preserve column names all the time)

> left_join(out,habitat)
Joining by: "Species"
     Species my_cnt1 lives_in
1     setosa      50      sea
2 versicolor      50     <NA>
3  virginica      50     <NA>

2 Answers


For the first part you can use dplyr like this

out <- iris %>% group_by(Species) %>% summarise(my_cnt1 = n())

Source: local data frame [3 x 2]

     Species my_cnt1
1     setosa      50
2 versicolor      50
3  virginica      50

To continue in one chain do this:

out <- iris %>% group_by(Species) %>% summarise(my_cnt1 = n()) %>% left_join(habitat)

Source: local data frame [3 x 3]

     Species my_cnt1 lives_in
1     setosa      50      sea
2 versicolor      50       NA
3  virginica      50       NA

By the way, dplyr now uses %>% in place of %.%. It does the same thing and is part of the package magrittr as well.


Or you can simply attach the dataframe and then run the table function. This will display the column names too.

> attach(iris)
> table(Species)
    setosa versicolor  virginica 
        50         50         50