2
votes

I have used the aggregate function to get a summary of results based on their collection location. The summary returns 3 obs. of 2 variables. One variable is the group name, one is the summary statistic per group.

How do I get R to view each column (Group, min, 1st quartile, median, etc.) as unique in my data frame? Ultimately I'd like this to be 3 obs. of 7 variables, one for each column. OR I'd like to know how to cleanly get min, median, and max by Location. Thanks!

Result <- c(1,1,2,100,50,30,45,20, 10, 8)
Location <- c("Alpha", "Beta", "Gamma", "Alpha", "Beta", "Gamma", "Alpha", "Beta", "Gamma", "Alpha")

df <- data.frame(Result, Location)
head(df)

Agg <- aggregate(df$Result, list(df$Location), summary)

head(Agg)
   Group.1 x.Min. x.1st Qu. x.Median x.Mean x.3rd Qu. x.Max.
1   Alpha   1.00      6.25    26.50  38.50     58.75 100.00
2    Beta   1.00     10.50    20.00  23.67     35.00  50.00
3   Gamma   2.00      6.00    10.00  14.00     20.00  30.00
1

1 Answers

3
votes

Since aggregate's simplify parameter defaults to TRUE, it's simplifying the results of calling the function (here, summary) to a matrix. You can reconstruct the data.frame, coercing the column into its own data.frame:

df <- data.frame(Result = c(1,1,2,100,50,30,45,20, 10, 8),
                 Location = c("Alpha", "Beta", "Gamma", "Alpha", "Beta", "Gamma", "Alpha", "Beta", "Gamma", "Alpha"))

Agg <- aggregate(df$Result, list(df$Location), summary)

data.frame(Location = Agg$Group.1, Agg$x)
#>   Location Min. X1st.Qu. Median     Mean X3rd.Qu. Max.
#> 1    Alpha    1     6.25   26.5 38.50000    58.75  100
#> 2     Beta    1    10.50   20.0 23.66667    35.00   50
#> 3    Gamma    2     6.00   10.0 14.00000    20.00   30

Alternately, dplyr's summarise family of functions can handle multiple summary statistics well:

library(dplyr)

df %>% group_by(Location) %>% summarise_all(funs(min, median, max))
#> # A tibble: 3 x 4
#>   Location   min median   max
#>   <fct>    <dbl>  <dbl> <dbl>
#> 1 Alpha       1.   26.5  100.
#> 2 Beta        1.   20.0   50.
#> 3 Gamma       2.   10.0   30.

If you really want all of summary, you can use broom::tidy to turn each group's results into a data frame in a list column, which can be unnested:

df %>% 
    group_by(Location) %>% 
    summarise(x = list(broom::tidy(summary(Result)))) %>% 
    tidyr::unnest()
#> # A tibble: 3 x 7
#>   Location minimum    q1 median  mean    q3 maximum
#>   <fct>      <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>
#> 1 Alpha         1.  6.25   26.5  38.5  58.8    100.
#> 2 Beta          1. 10.5    20.0  23.7  35.0     50.
#> 3 Gamma         2.  6.00   10.0  14.0  20.0     30.