I am calculating, hopefully right, the hypergeometric test per row in a data frame in R.
Where column 1 is names of genes (microRNAs), the column "Total_mRNAs" is how many mRNA exist in total in the genome so that doesn't change. Column "Total_targets_targets" is how many mRNAs each microRNA can target if all the mRNAs are present. However, for this example only "subset_mRNAs" are present (that number is also always the same) and among these I know how many mRNAs each microRNA can target "subset_targets".
In order to determine if targets for each microRNA are enriched compared to the background (total mRNAs and total microRNAs targeting them) I am performing a hypergeometric test per row like this:
phyper(targets-in-subset, targets-in-bkgd, failure-in-bkgd, sample-size-subset, lower.tail= FALSE)
dput(df1)
structure(list(Genes_names = c("microRNA-1", "microRNA-2", "microRNA-3",
"microRNA-4", "microRNA-5", "microRNA-6", "microRNA-7", "microRNA-8",
"microRNA-9", "microRNA-10"), Total_mRNAs = c(61064L, 61064L,
61064L, 61064L, 61064L, 61064L, 61064L, 61064L, 61064L, 61064L
), Total_targets_targets = c(1918L, 7807L, 3969L, 771L, 2850L,
1355L, 1560L, 2478L, 1560L, 2478L), subset_mRNAs = c(17571L,
17571L, 17571L, 17571L, 17571L, 17571L, 17571L, 17571L, 17571L,
17571L), subset_targets = c(544L, 2109L, 1137L, 213L, 793L, 394L,
430L, 686L, 430L, 686L)), class = "data.frame", row.names = c(NA,
-10L))
df1$pvalue <- phyper(df1$subset_targets, df1$Total_targets_targets, df1$Total_mRNAs-df1$Total_targets_targets, df1$subset_mRNAs, lower.tail= FALSE)
Now the question is how can I Bonferroni correct this values? Is this calculation theoretically right?
apply
function for this. Please give more details on what you want. From your code listenings I do not know how you want to calculate what exactly. – MacOSdf1$Total-df1$targets
comes from. – MacOS