I'm working with a dataset on conservation and its influence on biomass, in which fifty plots of land, each one hectare, were sampled at random from a ten thousand hectare area in Northern England.
For each plot of land, the following variables were recorded:
• biomass: an estimate of the biomass of vegetation in kg per square metre.
• alt: the mean altitude of the plot in metres above sea level.
• cons: a categorical variable, which was coded 1 if the plot was part of a conservation area, and 2 otherwise.
• soil a categorical variable crudely classifying soil type as 1 for chalk, 2 for clay and 3 for loam.
At the moment I am struggling with two things in particular:
How to calculate the average difference in biomass between clay (soil2) and loam (soil3) soils based on my fitted model (model1) and calculate 95% confidence interval for this mean predicted value.
And how to calculate the mean predicted biomass for a plot located in a conservation area with predominantly clay soil at an altitude of 300m?
This is a summary of the linear model that I'm working with.
Call:
lm(formula = biomass ~ alt + soil + cons, data = conservation)
Residuals:
Min 1Q Median 3Q Max
-0.183105 -0.052926 0.005593 0.061844 0.194402
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.2928629 0.0357850 64.073 < 2e-16 ***
alt -0.0029068 0.0001302 -22.318 < 2e-16 ***
soil2 -0.0862220 0.0342955 -2.514 0.0156 *
soil3 -0.2309939 0.0354480 -6.516 5.33e-08 ***
cons2 0.0488634 0.0292075 1.673 0.1013
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.09428 on 45 degrees of freedom
Multiple R-squared: 0.9459, Adjusted R-squared: 0.9411
F-statistic: 196.7 on 4 and 45 DF, p-value: < 2.2e-16
And here's the data:
dput(conservation)
structure(list(biomass = c(2.01, 2.06, 1.7, 2.07, 1.88, 2.11,
0.98, 2.14, 1.75, 1.81, 2.15, 1.68, 2.23, 2.04, 1.67, 1.77, 1.74,
1.53, 1.79, 2.15, 1.39, 2.19, 2.14, 2.29, 1.91, 1.73, 2.21, 1.96,
2.07, 2.01, 2.2, 2.24, 1.33, 1.05, 1.36, 1.72, 1.44, 1.52, 2.09,
1.42, 1.64, 0.92, 1.65, 1.37, 0.77, 1.57, 2.25, 2.23, 2.03, 1.18
), alt = c(116L, 21L, 130L, 65L, 117L, 82L, 359L, 5L, 86L, 91L,
64L, 178L, 79L, 70L, 209L, 110L, 161L, 248L, 146L, 23L, 237L,
84L, 40L, 7L, 161L, 122L, 25L, 146L, 67L, 118L, 42L, 57L, 277L,
338L, 331L, 153L, 239L, 237L, 67L, 171L, 206L, 371L, 107L, 236L,
482L, 240L, 56L, 42L, 68L, 436L), cons = structure(c(2L, 2L,
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L,
1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L
), .Label = c("1", "2"), class = "factor"), soil = structure(c(2L,
3L, 2L, 2L, 2L, 1L, 2L, 2L, 3L, 3L, 2L, 1L, 1L, 2L, 2L, 3L, 2L,
3L, 2L, 2L, 3L, 1L, 3L, 2L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 3L,
3L, 2L, 1L, 3L, 2L, 1L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 1L, 1L,
2L), .Label = c("1", "2", "3"), class = "factor"), alt.factor =
structure(c(1L,
1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L,
2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
2L), .Label = c("below median", "above median"), class = "factor")),
.Names = c("biomass",
"alt", "cons", "soil", "alt.factor"), row.names = c(NA, -50L), class =
"data.frame")