1
votes

For a sample dataframe:

df <- structure(list(area = c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k"), 
                      count = c(1L, 1L, 1L, 3L, 4L, 2L, 2L, 4L, 2L, 5L, 6L)), 
                 .Names = c("area", "count"), class = c("tbl_df", "tbl", "data.frame"), 
                 row.names = c(NA, -11L), spec = structure(list(cols = structure(list(area = structure(list(), 
                 class = c("collector_character", "collector")), count = structure(list(), class = c("collector_integer",
                 "collector"))), .Names = c("area", "count")), default = structure(list(), class = c("collector_guess", 
                "collector"))), .Names = c("cols", "default"), class = "col_spec"))

... which lists the number of occurrences of something per area, I wish to produce a another summary table showing how many areas have one occurrence, two occurrences, three occurrences etc. For example, there are three areas with 'One occurrence per area", three areas with 'Two occurrences per area", one area with 'Three occurrence per area" etc.

What is the best package/code to produce my desired result? I have tried with aggregate and plyr, but so far have had no success.

3

3 Answers

2
votes

I like the data.table syntax

library(data.table)
setDT(df) # transform data.frame into data.table format

# .N calculates the number of observations, by instance of the count variable
df[, .(n_areas = .N), by = count]

   count n_areas
1:     1       3
2:     3       1
3:     4       2
4:     2       3
5:     5       1
6:     6       1

See this question for comparison between the two big packages that are most used for this kind of operation: dplyr and data.table data.table vs dplyr: can one do something well the other can't or does poorly?

2
votes

You can use base R functions: using @Jimbou solution

table(df$count)
1 2 3 4 5 6 
3 3 1 2 1 1 
1
votes

This is quite intuitive using the wonderful dplyr library.

First, we group the data by the unique values of count, then we count the number of occurrences per group using n().

library(dplyr)
df %>%
    group_by(count) %>%
    summarise(number = n())

# A tibble: 6 x 2
  count number
  <int>  <int>
1     1      3
2     2      3
3     3      1
4     4      2
5     5      1
6     6      1