1
votes

Here is a sample of the data I'm working with

county urban_continuum p.shannon p.simpson
Brunswick B_Town 3.804079 0.9744810
Accomack A_Rural 3.830896 0.9771901
Buena Vista B_Town 3.970617 0.9802289
Amherst D_City 4.007048 0.9813272
Buckingham C_Suburb 4.055685 0.9796187
Campbell D_City 4.161142 0.9837963
Cumberland A_Rural 4.229130 0.9850256
Danville C_Suburb 4.631135 0.9888504

Note: "p.simpson" and "p.shannon" refer to simpson diversity and shannon diversity I'm trying to get the mean and the standard deviation for each category (e.g. the mean for "B_Town" is 3.97235). I first used aggregate. Here's what I have for the mean (the code for standard deviation is the same but FUN="sd"): urbancon_div.mean=aggregate(p.simpson~urban_continuum+p.shannon, data=plant.co, FUN="mean") Here's what R gives me:

aggregated df

Notice that even when "county" is not in the code, it still gives me means for individual counties. I'm trying to find the mean of each diversity metric for each category across all counties. How do I get the mean and sd for each category across all counties not by individual counties?

2
first you are having p.shannon on the wrong side of the frmula. Also how does 3.8 and 3.97 have a mean of 3.97?Onyambu
I was making up a number there as an example. I didn't actually do a calculation. My apologies for the confusionmadam_fledershrew

2 Answers

2
votes

You may try using dplyr

library(dplyr)

plant.co <- read.table(text = "county   urban_continuum p.shannon   p.simpson
Brunswick   B_Town  3.804079    0.9744810
Accomack    A_Rural 3.830896    0.9771901
'Buena Vista'   B_Town  3.970617    0.9802289
Amherst D_City  4.007048    0.9813272
Buckingham  C_Suburb    4.055685    0.9796187
Campbell    D_City  4.161142    0.9837963
Cumberland  A_Rural 4.229130    0.9850256
Danville    C_Suburb    4.631135    0.9888504", header = T)

plant.co %>%
  group_by(urban_continuum) %>%
  summarize(p.shannon.mean = mean(p.shannon),
            p.shannon.sd = sd(p.shannon),
            p.simpson.mean = mean(p.simpson),
            p.simpson.sd = sd(p.simpson))

  urban_continuum p.shannon.mean p.shannon.sd p.simpson.mean p.simpson.sd
  <chr>                    <dbl>        <dbl>          <dbl>        <dbl>
1 A_Rural                   4.03        0.282          0.981      0.00554
2 B_Town                    3.89        0.118          0.977      0.00406
3 C_Suburb                  4.34        0.407          0.984      0.00653
4 D_City                    4.08        0.109          0.983      0.00175
1
votes

If you are using aggregate:

aggregate(cbind(p.simpson, p.shannon)~urban_continuum, df, \(x)c(mean = mean(x), sd = sd(x)))

  urban_continuum p.simpson.mean p.simpson.sd p.shannon.mean p.shannon.sd
1         A_Rural    0.981107850  0.005540535      4.0300130    0.2815940
2          B_Town    0.977354950  0.004064379      3.8873480    0.1177601
3        C_Suburb    0.984234550  0.006527798      4.3434100    0.4069046
4          D_City    0.982561750  0.001745917      4.0840950    0.1089609

or simply:

aggregate(.~urban_continuum, df[-1], \(x)c(mean = mean(x), sd = sd(x)))