I have a data frame with columns genes, the region of the chromosome they belong to, the cell line the gene expression was measured from, and the gene's expression level in that cell line -- it looks basically something like this:
gene region cell_line expression
A X Joe 1
B X Joe 2
C Y Joe 2
D Z Joe 3
E Z Joe 0
A X Claire 2
B X Claire 1
C Y Claire 3
D Z Claire 3
E Z Claire 1
What I want to do is, for each cell line, calculate the mean, standard deviation, etc. for a chromosomal region of all genes NOT in the given region. So for region X of Joe, for example, I want the output "summarize()" row to show the mean of the expression for all genes NOT in Joe's X (i.e. genes C, D, E of Joe).
So the output looks something like:
region cell_line mean_other standard_deviation_other
X Joe 1.67 some number
Y Joe 1.5 some number
Z Joe 1.67 some number
X Claire 2.33 some number
Y Claire 2.33 some number
Z Claire 2 some number
My idea would be to do the following, except I have no clue on how to get summarize to manipulate groups outside of the one it's "operating on" at a given time.
df %>% group_by(region, cell_line) %>%
summarize(mean_other = mean(expression of cell lines not in this group),
standard_deviation_other = var(expression of cell lines not in this group)