2
votes

I have a matrix of plant species occurrence data. The matrix is set up so that every column is a species, and every row is a sampling location. I also have identifiers that group sampling locations based on certain environmental variables. I would like to create columns sums for each species, but subgrouped by the specific environmental variables.

An example data set:

library(vegan)
data("dune")
data("dune.env")
dune$plot <- c(1:20); dune.env$plot <- c(1:20)
merge(dune, dune.env)

So there are now 20 plots, with 30 species observed, and 5 associated environmental variables. I would like to generate the sum of the number of individuals observed per species, grouped by "Management". I have tried something like this:

library(tidyverse)
sums <- group_by(data, data$Management) %>% colSums(data[,(2:31)], na.rm = TRUE)

but I always get an error about incorrect dims. I am not sure how I would go about solving my problem. Ideally, the result would be a dataframe with 4 rows (1 for each management type) where all the species (cols 2:31) have been summed.

3
it seems to me that you could benefit from converting your wide table to a long one (see ?reshape)PavoDive

3 Answers

1
votes

rowsums does what you need:

dat <- merge(dune, dune.env)

> rowsum(dat[,2:31], dat$Management)
   Achimill Agrostol Airaprae Alopgeni Anthodor Bellpere Bromhord Chenalbu   ...
BF        7        0        0        2        4        5        8        0   ...    
HF        6        7        0        8        9        2        4        0   ...     
NM        2       13        5        0        8        2        0        0   ...     
SF        1       28        0       26        0        4        3        1   ...     
0
votes

use data.table:

require(data.table)
a <- merge(dune, dune.env)
setDT(a)
a[, lapply(.SD, sum), by = Management, .SDcols = names(a)[2:31]]
0
votes

Well, I was doing something very similar a few days ago: How to obtain species richness and abundance for sites with multiple samples using dplyr

To modify the excellent answer given by @akrun:

  df <- merge(dune, dune.env)
  library(dplyr)
  df2<- df %>% 
      group_by(Management) %>% 
      summarise_at(sum, .vars = vars(Achimill:Callcusp))