0
votes

I'm trying to organize a dataset with multiple replicates of the same location by unique locations. In addition, I would only like to keep each location entry with the maximum abundance reported. Here is the top 6 rows of the dataset. Notice how 3 and 4 are the same location, so I would want to discard row 3 and keep row 4 as it has a higher abundance. Rows 5 and 6 have both have the highest abundance but I only need to keep one of them.

X  abun   location     
1   1     L2507550 
2   1     L668283 
3   1     L831877 
4   5     L831877 
5   3     L668283 
6   3     L668283 

Here is the code that I used:

require(dplyr)
require(reshape2)
require(lubridate)

########Load data and clean########
#set working directory to load data from Data folder
setwd("V:/snailData")
getwd()

#Load csv
m <- read.csv("may.csv")

#group data by location and identify the maximum abundance for each location
m_max <- m %>% group_by(location) %>% summarise(m, max(abun))

Here's the error message I get:

> m_max <- m %>% group_by(location) %>% summarise(m, max(abun))
Error: expecting a single value

Am I getting this error because there are multiple records of the highest abundance? Any insight into how this problem can be fixed would be helpful. Thank you.

UPDATE

This answer from @paljenczy got rid of the error message (thank you!): The pipe operator %>% passes the result of the expression on its left as a first argument to the function on the right. Thus you do not need m as the first argument to summarise. Try

'm_max <- m %>% group_by(location) %>% summarise(max(abun))'

However, the command ended up singling out only those locations with the highest abundance (only locations with abundance of 15) not the highest abundance at each location. Does anyone know how to fix this?

1
Perhaps you have also loaded the plyr package which leads to function name conflicts. Therefore, you can try m %>% group_by(location) %>% dplyr::summarise(max(abun)) and in future, load plyr first and then dplyr or don't load plyr at all if not requiredtalat
@Elizabeth M. see the updated answer. If it solves your problem, please consider accepting it.paljenczy

1 Answers

1
votes

The pipe operator %>% passes the result of the expression on its left as a first argument to the function on the right. Thus you do not need m as the first argument to summarise. Using dplyr 0.4.3, try

library(dplyr)

m <- data.frame(X = 1:6,
                abun = c(1, 1, 1, 5, 3, 3),
                location = c("L2507550",
                             "L668283",
                             "L831877",
                             "L831877",
                             "L668283",
                             "L668283"),
                stringsAsFactors = F)

m_max <- m %>% group_by(location) %>% summarise(max(abun))

> m_max
Source: local data frame [3 x 2]

  location  abun
     (chr) (dbl)
1 L2507550     1
2  L668283     3
3  L831877     5