I'm trying to organize a dataset with multiple replicates of the same location by unique locations. In addition, I would only like to keep each location entry with the maximum abundance reported. Here is the top 6 rows of the dataset. Notice how 3 and 4 are the same location, so I would want to discard row 3 and keep row 4 as it has a higher abundance. Rows 5 and 6 have both have the highest abundance but I only need to keep one of them.
X abun location
1 1 L2507550
2 1 L668283
3 1 L831877
4 5 L831877
5 3 L668283
6 3 L668283
Here is the code that I used:
require(dplyr)
require(reshape2)
require(lubridate)
########Load data and clean########
#set working directory to load data from Data folder
setwd("V:/snailData")
getwd()
#Load csv
m <- read.csv("may.csv")
#group data by location and identify the maximum abundance for each location
m_max <- m %>% group_by(location) %>% summarise(m, max(abun))
Here's the error message I get:
> m_max <- m %>% group_by(location) %>% summarise(m, max(abun))
Error: expecting a single value
Am I getting this error because there are multiple records of the highest abundance? Any insight into how this problem can be fixed would be helpful. Thank you.
UPDATE
This answer from @paljenczy got rid of the error message (thank you!): The pipe operator %>% passes the result of the expression on its left as a first argument to the function on the right. Thus you do not need m as the first argument to summarise. Try
'm_max <- m %>% group_by(location) %>% summarise(max(abun))'
However, the command ended up singling out only those locations with the highest abundance (only locations with abundance of 15) not the highest abundance at each location. Does anyone know how to fix this?
plyr
package which leads to function name conflicts. Therefore, you can trym %>% group_by(location) %>% dplyr::summarise(max(abun))
and in future, load plyr first and then dplyr or don't load plyr at all if not required – talat