1
votes

I'll preface this by saying I'm very much a self taught beginner with R.

I have a very large data set looking at biological data. I want to find the average of a variable "shoot.density" split by year, but my date data is entered as "%d/%m/%y". This means using the normal way I would achieve this splits by each individual date, rather than by year only, eg.

tapply(df$Shoot.Density, list(df$Date), mean)

Any help would be much appreciated. I am also happy to paste in a section of my data, but I'm not sure how.

1

1 Answers

2
votes

If your data is in date-class, you can use format to transform your date column to a year variable:

tapply(df$Shoot.Density, list(format(df$Date, '%Y')), mean)

If it is in the format %d/%m/%y, you need the substr function:

tapply(df$Shoot.Density, list(substr(df$Date,7,8)), mean)

You can also do this with dplyr:

library(dplyr)
df %>% 
  group_by(years = format(df$Date, '%Y')) %>% 
  summarise(means = mean(Shoot.Density))

Another way to do this is with the year function of the data.table package:

library(data.table)
setDT(df)[, mean(Shoot.Density), by = year(Date)]