First, I will create a reproducible dataset:
data<-data.frame("col1"=rnorm(500),"col2"=rnorm(500),
"col3"=c(rep(TRUE,250),rep(FALSE,250)),"col4"=c(rep(FALSE,250),rep(TRUE,250)))
If I understand you correctly, I am guessing your "(col1)/(col2)" corresponds to "(kg crop/hectares)" here.
If this is true, you can create a new column in your dataset (named 'data' here) for 'yield' by:
data$yield <- data$col1 / data$col2
head(data)
col1 col2 col3 col4 yield
1 0.8976488 0.006764518 TRUE FALSE 132.6996029
2 -0.2829754 0.980092790 TRUE FALSE -0.2887230
3 -0.2266733 1.285616004 TRUE FALSE -0.1763149
4 1.4690071 -0.297252879 TRUE FALSE -4.9419440
5 -0.1438242 0.917662116 TRUE FALSE -0.1567289
6 -1.3297183 -0.880964698 TRUE FALSE 1.5093889
Then there are multiple ways to look at these means. One 'indexing' way would be:
mean(data$yield[data$col3==T & data$col4==F])
[1] 1.929354
This is asking what the yield is when col3 of data is True and col4 is False specifically.
However, if you want the summary of all possible combinations of groups, you can use the package dplyr
this way:
install.packages("dplyr") # This will have to be run only the first time you use the package on one machine
library(dplyr) # This code will need to be run every new R session
data %>% group_by(col3,col4) %>%
summarise(
MeanYield = mean(yield)
)
# A tibble: 2 x 3
# Groups: col3 [2]
col3 col4 MeanYield
<lgl> <lgl> <dbl>
1 FALSE TRUE 20.4
2 TRUE FALSE 1.93
In this case there were only two possibilities (col3=T & col4=F) and (col3=F & col4=T), but the code will give you all possibilities.
- As an afternote: I know that negative values may not make sense for crop or hectares, I simply used
rnorm
to be quick here (although my explaining this defeats the purpose of being quick).