0
votes

I have a data.frame containing survey data on three binary variables. The data is already in a contingency table with the first 3 columns being answers (1=yes, 0 = no) and the fourth column showing the total number of answers. The rows is three different groups.

My aim is to calulate z-scores to check if the proportions are significantly different compared to the total

this is my data:

library(dplyr) #loading libraries
df <- structure(list(var1 = c(416, 1300, 479, 417), 
                     var2 = c(265, 925,473, 279),
                     var3 = c(340, 1013, 344, 284),
                     totalN = c(1366, 4311,1904, 1233)),
                class = "data.frame",
                row.names = c(NA, -4L),
                .Names = c("var1","var2", "var3", "totalN"))

and these are my total values

dfTotal <-  df %>% summarise_all(funs(sum(., na.rm=TRUE)))
dfTotal
dfTotal <- data.frame(dfTotal)
rownames(dfTotal) <- "Total"

to calculate zScore I use the following formula:

zScore <- function (cntA, totA, cntB, totB) {
  #calculate
  avgProportion <- (cntA + cntB) / (totA + totB)
  probA <- cntA/totA
  probB <- cntB/totB
  SE <- sqrt(avgProportion * (1-avgProportion)*(1/totA + 1/totB))
  zScore <- (probA-probB) / SE
  return (zScore)
}

is there a way using dplyr to calculate a 4x3 matrix that holds for all four groups and variables var1 to var3 the z-test-value against the total proportion?

I am currently stuck with this bit of code:

df %>% mutate_all(funs(zScore(., totalN,dftotal$var1,dfTotal$totalN)))

So the parameters currently used here as dftotal$var1 and dfTotal$totalN don't work, but I have no idea how to feed them into the formula. for the first parameter it must not be always var1 but should be var2, var3 (and totalN) to match the first parameter.

2
the total number of answers column depicts the number of answers per question ? also, what have you tried?mtoto
the total number of answers depicts the number of persons asked in that group (each line is a group's results). So in the first group 416 persons ticked question1 and in total 1366 persons are in that group. also I added where I'm currently stuckJan
And how would you arrive at a 3x3 matrix if you want to calculate the z-score for each question and per group against the proportions in the total? You have four rows so that's 3x4=12 z-scores.mtoto
sorry, you're right. that happens when you work with your original data and suddenly create an artificial example. I corrected the question.Jan

2 Answers

4
votes

z-score in R is handled with scale:

scale(df)
           var1        var2       var3     totalN
[1,] -0.5481814 -0.71592544 -0.4483732 -0.5837722
[2,]  1.4965122  1.42698064  1.4952995  1.4690147
[3,] -0.4024623 -0.04058534 -0.4368209 -0.2087639
[4,] -0.5458684 -0.67046986 -0.6101053 -0.6764787

If you want only the three var columns:

scale(df[,1:3])
           var1        var2       var3
[1,] -0.5481814 -0.71592544 -0.4483732
[2,]  1.4965122  1.42698064  1.4952995
[3,] -0.4024623 -0.04058534 -0.4368209
[4,] -0.5458684 -0.67046986 -0.6101053
1
votes

If you want to use your zScore function inside a dplyr pipeline, we'll need to tidy your data first and add new variables containing the values you now have in dfTotal:

library(dplyr)
library(tidyr)

        # add grouping variables we'll need further down
df %>% mutate(group = 1:4) %>% 
        # reshape data to long format
        gather(question,count,-group,-totalN) %>%
        # add totals by question to df
        group_by(question) %>%
        mutate(answers = sum(totalN),
               yes = sum(count)) %>%
        # calculate z-scores by group against total
        group_by(group,question) %>%
        summarise(z_score = zScore(count, totalN, yes, answers)) %>%
        # spread to wide format
        spread(question, z_score)
## A tibble: 4 x 4
#  group       var1       var2      var3
#* <int>      <dbl>      <dbl>     <dbl>
#1     1  0.6162943 -2.1978303  1.979278
#2     2  0.6125615 -0.7505797  1.311001
#3     3 -3.9106430  2.6607258 -4.232391
#4     4  2.9995381  0.4712734  0.438899