0
votes

I have survey data (picture sample below) I'm working with to find 95% confidence intervals for. The Q#d columns (Q1d, Q2d, etc.) each correspond to different questions on the survey (Likert scale with results dichotomized, 1 = yes, 0 = no). The intervention column describes whether the results were before intervention (FALSE) or after intervention (TRUE). What I want to do is get the 95% confidence intervals on the difference in proportions for each question before and after intervention.

For example, let's say for Q1d the proportion that answered "yes" before intervention is .2 and after the intervention is .5. The difference would be .3 or 30%, and I want to calculate the confidence interval (let's say between 25% and 35%) on the difference. I want to do this for every single question in the survey (all Q1d). I have not been able to find a way to iterate through and do this for all questions (columns). I've written a function that can successfully do it for one column, but iterating through column names isn't working for me, and I don't know how to store the results as a vector/dataframe. I've included the function below. Any guidance?

Thanks so much!!

get_conf_int <- function(df, colName) {
  myenc <- enquo(colName)
  p <- df %>%
    group_by(Intervention) %>%
    summarize(success=sum(UQ(myenc)==1, na.rm=TRUE), total=n())
  prop.test(x=pull(p,success), n=pull(p, total))$conf.int[2:1]*-100
} 

And I can call the function like:

get_conf_int(db, Q1d)

I'm using prop.test to find confidence interval for now, but open to other methods as well.

dataframe being used

1

1 Answers

1
votes

I can't assure you if prop.table is better than binom.test, you should read more about those two.

library(dplyr)

# just for this example, you have your survey here
df <- data.frame(Intervention=sample(x = c(TRUE,FALSE), size = 20, replace = TRUE), 
                 Q1d=sample(x = 0:1, size = 20, replace = TRUE),
                 Q2d=sample(x = 0:1, size = 20, replace = TRUE),
                 Q3d=sample(x = 0:1, size = 20, replace = TRUE),
                 Q4d=sample(x = 0:1, size = 20, replace = TRUE),
                 Q5d=sample(x = 0:1, size = 20, replace = TRUE),
                 Q6d=sample(x = 0:1, size = 20, replace = TRUE),
                 Q7d=sample(x = 0:1, size = 20, replace = TRUE))

# vector with the sum of FALSE and the sum of TRUE
count_Intervention <- c(length(which(!df$Intervention)),length(which(df$Intervention)))

# group by TRUE/FALSE and sum(count) the 1's
df_sum <- df %>%
  group_by(Intervention) %>%
  summarize(across((colnames(df)[-1]),list(sum)))

# for new info.  I added the pvalue, that might be important
new_df <- data.frame(Question=as.character(), LowerConfInt=as.numeric(), UpperConfInt=as.numeric(), Pvalue = as.numeric())

#loop
for (Q_d in colnames(df_sum)[-1]) {
  lower <- prop.test(as.vector(t(df_sum[,Q_d])), count_Intervention)$conf.int[1]
  upper <- prop.test(as.vector(t(df_sum[,Q_d])), count_Intervention)$conf.int[2]
  pvalue <- prop.test(as.vector(t(df_sum[,Q_d])), count_Intervention)$p.value
  new_df <- rbind(new_df, data.frame(Q_d, lower, upper, pvalue)) 
  
}

new_df
    Q_d      lower      upper     pvalue
1 Q1d_1 -0.2067593  0.8661000 0.34844258
2 Q2d_1 -0.9193444 -0.1575787 0.05528499
3 Q3d_1 -0.4558861  0.5218202 1.00000000
4 Q4d_1 -0.4558861  0.5218202 1.00000000
5 Q5d_1 -0.7487377  0.3751114 0.74153726
6 Q6d_1 -0.2067593  0.8661000 0.34844258
7 Q7d_1 -0.4558861  0.5218202 1.00000000