I have a data.frame containing survey data on three binary variables. The data is already in a contingency table with the first 3 columns being answers (1=yes, 0 = no) and the fourth column showing the total number of answers. The rows is three different groups.
My aim is to calulate z-scores to check if the proportions are significantly different compared to the total
this is my data:
library(dplyr) #loading libraries
df <- structure(list(var1 = c(416, 1300, 479, 417),
var2 = c(265, 925,473, 279),
var3 = c(340, 1013, 344, 284),
totalN = c(1366, 4311,1904, 1233)),
class = "data.frame",
row.names = c(NA, -4L),
.Names = c("var1","var2", "var3", "totalN"))
and these are my total values
dfTotal <- df %>% summarise_all(funs(sum(., na.rm=TRUE)))
dfTotal
dfTotal <- data.frame(dfTotal)
rownames(dfTotal) <- "Total"
to calculate zScore I use the following formula:
zScore <- function (cntA, totA, cntB, totB) {
#calculate
avgProportion <- (cntA + cntB) / (totA + totB)
probA <- cntA/totA
probB <- cntB/totB
SE <- sqrt(avgProportion * (1-avgProportion)*(1/totA + 1/totB))
zScore <- (probA-probB) / SE
return (zScore)
}
is there a way using dplyr to calculate a 4x3 matrix that holds for all four groups and variables var1 to var3 the z-test-value against the total proportion?
I am currently stuck with this bit of code:
df %>% mutate_all(funs(zScore(., totalN,dftotal$var1,dfTotal$totalN)))
So the parameters currently used here as dftotal$var1 and dfTotal$totalN don't work, but I have no idea how to feed them into the formula. for the first parameter it must not be always var1 but should be var2, var3 (and totalN) to match the first parameter.