I'm attempting to run some statistical analyses on a field trial that was constructed over 2 sites over the same growing season.
At both sites (Site
, levels: HF|NW) the experimental design was a RCBD with 4 (n=4) blocks (Block
, levels: 1|2|3|4 within each Site
).
There were 4 treatments - 3 different forms of nitrogen fertiliser and a control (no nitrogen fertiliser) (Treatment
, levels: AN, U, IU, C).
During the field trial there were 3 distinct periods that commenced with fertiliser addition and ended with harvesting of the grass. These periods have been given the levels 1|2|3 under the factor N_app
.
There are a range of measurements that I would like to test the following null hypothesis H0 on:
Treatment
(H0) had no effect on measurement
Two of the measurements I am particularly interested in are: grass yield and ammonia emissions.
Starting with grass yield (Dry_tonnes_ha
) as
shown here, a nice balanced data set
The data can be downloaded in R using the following code:
library(tidyverse)
download.file('https://www.dropbox.com/s/w5ramntwdgpn0e3/HF_NW_grass_yield_data.csv?raw=1', destfile = "HF_NW_grass_yield_data.csv", method = "auto")
raw_data <- read.csv("HF_NW_grass_yield_data.csv", stringsAsFactors = FALSE)
HF_NW_grass <- raw_data %>% mutate_at(vars(Site, N_app, Block, Plot, Treatment), as.factor) %>%
mutate(Date = as.Date(Date, format = "%d/%m/%Y"),
Treatment = factor(Treatment, levels = c("AN", "U", "IU", "C")))
I have had a go at running an ANOVA on this using the following approach:
model_1 <- aov(formula = Dry_tonnes_ha ~ Treatment * N_app + Site/Block, data = HF_NW_grass, projections = TRUE)
I have a few concerns with this.
Firstly, what is the best way to test assumptions? For a simple one-way ANOVA I would use shapiro.test()
and bartlett.test()
on the dependent variable (Dry_tonnes_ha
) to assess normality and heterogeneity of variance. Can I use the same approach here?
Secondly, I am concerned that N_app
is a repeated measure as the same measurement is taken from the same plot over 3 different periods - what is the best way to build this repeated measures into the model?
Thirdly, I'm not sure of the best way to nest Block
within Site
. At both sites the levels of Block
are 1:4. Do I need to have unique Block
levels for each site?
I have another data set for NH3 emissions here. R code to download:
download.file('https://www.dropbox.com/s/0ax16x95m2z3fb5/HF_NW_NH3_emissions.csv?raw=1', destfile = "HF_NW_NH3_emissions.csv", method = "auto")
raw_data_1 <- read.csv("HF_NW_NH3_emissions.csv", stringsAsFactors = FALSE)
HF_NW_NH3 <- raw_data_1 %>% mutate_at(vars(Site, N_app, Block, Plot, Treatment), as.factor) %>%
mutate(Treatment = factor(Treatment, levels = c("AN", "U", "IU", "C")))
For this I have all the concerns above with the addition that the data set is unbalanced.
At HF
for N_app
1 n=3, but for N_app
2 & 3 n=4
At NW
n=4 for all N_app
levels.
At NF
measurements were only made on the Treatment
levels U
and IU
At NW
measuremnts were made on Treatment
levels AN
, U
and IU
I'm not sure how to deal with this added level of complexity. I am tempted to just analyse as 2 separate site (the fact that the N_app
periods are not the same at each site may encourage this approach).
Can I use a type iii sum of squares ANOVA here?
It has been suggested to me that a linear mixed modelling approach may be the way forward but I'm not familiar with using these.
I would welcome your thoughts on any of the above. Thanks for your time.
Rory