Say I have a data.table such as: (or with numbers and NAs)
temp <- data.table(M=c(NA,T,T,F,F,F,NA,NA,F),
P=c(T,T,T,F,F,F,NA,NA,NA), S=c(T,F,NA,T,F,NA,NA,NA,NA))
M P S
NA TRUE TRUE
TRUE TRUE FALSE
TRUE TRUE NA
FALSE FALSE TRUE
FALSE FALSE FALSE
FALSE FALSE NA
NA NA NA
NA NA NA
FALSE NA NA
And I want to check if whenever a variable is NA implies that the values of a second variable are all NA as well. To check if some variables are linked to other.
For example, whenever P=NA we have also S=NA.
This code works properly for two single columns:
temp[is.na(P),all(is.na(S))]
gives TRUE
and
temp[is.na(S),all(is.na(P))]
gives FALSE because the sixth row is S=NA but P!=NA.
Now my question.
I would like to generalize it, checking all pairs in my data.table and print what pairs are "linked".
I'd prefer to print only the results that are TRUE, ignoring the FALSE ones because most pairs in my real data.table won't be linked, and I have 550 variables.
I've tried this code:
temp[, lapply(.SD, function(x) temp[is.na(x),
lapply(.SD, function(y) all(is.na(y)) )]]
I get this error
Error: unexpected ']' in: "temp[, lapply(.SD, function(x) temp[is.na(x), lapply(.SD, function(y) all(is.na(y)) )]]"
I could try with a for loop but I'd prefer the typical data.table syntax. Any suggestion is welcome.
I would also like to know how to refer to two different .SD when you are nesting data.table calls.