3
votes

dplyr is fast and I would like to use the %.% piping a lot. I want to use a table function (count by frequency) and preserve column name and have output be data.frame.

How can I achieve the same as the code below using only dplyr functions (imagine huge data.table (BIGiris) with 6M rows)

> out<-as.data.frame(table(iris$Species))
> names(out)[1]<-'Species'
> names(out)[2]<-'my_cnt1'
> out

output is this. Notice that I have to rename back column 1. Also, in dplyr mutate or other call - I would like to specify name for my new count column somehow.

     Species my_cnt1
1     setosa      50
2 versicolor      50
3  virginica      50

imagine joining to a table like this (assume iris data.frame has 6M rows) and species is more like "species_ID"

> habitat<-data.frame(Species=c('setosa'),lives_in='sea')

final join and output (for joining, I need to preserve column names all the time)

> left_join(out,habitat)
Joining by: "Species"
     Species my_cnt1 lives_in
1     setosa      50      sea
2 versicolor      50     <NA>
3  virginica      50     <NA>
> 
2

2 Answers

9
votes

For the first part you can use dplyr like this

library(dplyr)
out <- iris %>% group_by(Species) %>% summarise(my_cnt1 = n())
out

Source: local data frame [3 x 2]

     Species my_cnt1
1     setosa      50
2 versicolor      50
3  virginica      50

To continue in one chain do this:

out <- iris %>% group_by(Species) %>% summarise(my_cnt1 = n()) %>% left_join(habitat)
out

Source: local data frame [3 x 3]

     Species my_cnt1 lives_in
1     setosa      50      sea
2 versicolor      50       NA
3  virginica      50       NA

By the way, dplyr now uses %>% in place of %.%. It does the same thing and is part of the package magrittr as well.

-1
votes

Or you can simply attach the dataframe and then run the table function. This will display the column names too.

> attach(iris)
> table(Species)
 Species
    setosa versicolor  virginica 
        50         50         50