Getting the median for specific categories within a data.frame

Question

Here is a sample of the data I'm working with

county	urban_continuum	p.shannon	p.simpson
Brunswick	B_Town	3.804079	0.9744810
Accomack	A_Rural	3.830896	0.9771901
Buena Vista	B_Town	3.970617	0.9802289
Amherst	D_City	4.007048	0.9813272
Buckingham	C_Suburb	4.055685	0.9796187
Campbell	D_City	4.161142	0.9837963
Cumberland	A_Rural	4.229130	0.9850256
Danville	C_Suburb	4.631135	0.9888504

Note: "p.simpson" and "p.shannon" refer to simpson diversity and shannon diversity I'm trying to get the mean and the standard deviation for each category (e.g. the mean for "B_Town" is 3.97235). I first used aggregate. Here's what I have for the mean (the code for standard deviation is the same but FUN="sd"): urbancon_div.mean=aggregate(p.simpson~urban_continuum+p.shannon, data=plant.co, FUN="mean") Here's what R gives me:

Notice that even when "county" is not in the code, it still gives me means for individual counties. I'm trying to find the mean of each diversity metric for each category across all counties. How do I get the mean and sd for each category across all counties not by individual counties?

first you are having p.shannon on the wrong side of the frmula. Also how does 3.8 and 3.97 have a mean of 3.97? — Onyambu
I was making up a number there as an example. I didn't actually do a calculation. My apologies for the confusion — madam_fledershrew

Park Park · Accepted Answer · 2021-12-08T02:22:20

You may try using dplyr

library(dplyr)

plant.co <- read.table(text = "county   urban_continuum p.shannon   p.simpson
Brunswick   B_Town  3.804079    0.9744810
Accomack    A_Rural 3.830896    0.9771901
'Buena Vista'   B_Town  3.970617    0.9802289
Amherst D_City  4.007048    0.9813272
Buckingham  C_Suburb    4.055685    0.9796187
Campbell    D_City  4.161142    0.9837963
Cumberland  A_Rural 4.229130    0.9850256
Danville    C_Suburb    4.631135    0.9888504", header = T)

plant.co %>%
  group_by(urban_continuum) %>%
  summarize(p.shannon.mean = mean(p.shannon),
            p.shannon.sd = sd(p.shannon),
            p.simpson.mean = mean(p.simpson),
            p.simpson.sd = sd(p.simpson))

  urban_continuum p.shannon.mean p.shannon.sd p.simpson.mean p.simpson.sd
  <chr>                    <dbl>        <dbl>          <dbl>        <dbl>
1 A_Rural                   4.03        0.282          0.981      0.00554
2 B_Town                    3.89        0.118          0.977      0.00406
3 C_Suburb                  4.34        0.407          0.984      0.00653
4 D_City                    4.08        0.109          0.983      0.00175

Getting the median for specific categories within a data.frame

2 Answers