Ignoring NA in R across multiple columns of Datafrme using na.omit or NA.RM and mapply

Question

I have a dataframe that looks somethings like this:

 SampleNo Lab1  Lab2  Lab3 lab4    lab5    lab6    lab7   lab8    lab9   lab10 
1  59.84    60.59   60.39   60.29   60.19   60.32   60.24   60.3    60.43   NA
2  59.78    60.19   60.16   60.23   60.32   60.46   60.53   60.2    60.40   59.6
3  59.86    60.17   60.22   60.28   60.18   60.42   60.21   60.0    60.44   NA 
4  59.85    60.42   60.28   60.31   60.19   60.41   60.54   60.2    60.48  59.7 
5  59.97    60.79   60.30   60.26   60.40   60.47   60.52   60.0    60.46   59.7
6  60.03    60.26   60.36   60.21   60.32   60.46   60.50   60.1    60.29   60.0

I would like to sum the squares of each column in the dataframe whilst ignoring NA values and assign to a new vector. I can get code to work for 1 column but I would like to use the mapply function or something similar to get the values across all the columns at the same time and assign to a new vector.

I have the following code for a single column

myvector <- sum(na.omit(df[,2] - mean(df[,2))^2)) this works for 1 column

I have tried the following for the whole dataframe

myvector <- (mapply(na.omit(sum(df[,2:11] - mean(df[,2:11]))^2)))

I get error saying "error in match.fun(FUN): c(""na.omit(sum(df[,2:11] - mean(df[, is not a function, character or symbol", 2:11]))^2 is not a function character or symbol

and

myvector <- (mapply(sum(na.omit(df[,2:11] - mean(df[,2:11]))^2)))

but get this error:

Error in sum(na.omit, df[, 2:11] - mean(df[, : invalid 'type' (closure) of argument In addition: Warning message: In mean.default(df[, 2:11]) : argument is not numeric or logical: returning NA

My thought is that the na.omit is in the wrong place but I am lost as to where it should go.

Darren Tsai Darren Tsai · Accepted Answer · 2020-09-11T06:15:52

You can transpose the data, subtract column means, and then compute the sums of squares.

rowSums((t(df[-1]) - colMeans(df[-1], na.rm = TRUE))^2, na.rm = TRUE)

In addition, you can multiply the sample variance of each column and the respective length of non-missing values minus 1 to get sums of squared difference.

sapply(df[-1], var, na.rm = TRUE) * (colSums(!is.na(df[-1])) - 1)

# Lab1  Lab2  Lab3  lab4  lab5  lab6  lab7  lab8  lab9 lab10 
# 0.04  0.31  0.04  0.01  0.04  0.02  0.12  0.07  0.02  0.09

Formula:

Ignoring NA in R across multiple columns of Datafrme using na.omit or NA.RM and mapply

2 Answers