calculate mean for cases that responded to a minimum number of items in R

Question

In SPSS you can calculate means for cases that responded to a minimum number of questions. In SPSS I would type

COMPUTE compvar = MEAN.4(var1, var2, var3, var4, var5, var6, var7).

And this would generate a new variable (i.e., compvar) only for cases that had a value present for 4 or more of the array var1 - var7. That's what the .4 is doing in the command, setting a minimum number of responses before the command will run for a case.

Any tips on doing this in R so I can stop jumping to SPSS?

Please provide example data for this question. Is each column a binary variable with Y/N and/or some kind of response? How is non-response denoted? — theamateurdataanalyst

jeremycg jeremycg · Accepted Answer · 2015-06-29T19:16:12

There isn't a built in function as far as I know - here's one you can try:

mycolmeans<-function(df,n){
  holding<-colMeans(df,na.rm=TRUE)
  holding[n > as.vector(colSums(!is.na(df)))]<-NA
  holding
}

This assumes you have a dataframe holding your values in columns, and you want an NA returned when it has too many missing values, which are denoted as NAs.

x <- structure(list(a = c(1, 2, 3, 4, 5, 6), b = c(NA, NA, 3, 4, 5, 
6)), .Names = c("a", "b"), row.names = c(NA, -6L), class = "data.frame")

mycolmeans(x,4)
mycolmeans(x,6)

calculate mean for cases that responded to a minimum number of items in R

4 Answers