1
votes

I have two data frames. The reproducible examples are given below:

structure(list(`1` = c(0L, 1L, 1L), `2` = c(1L, 0L, -1L), `3` = c(0L, 
0L, 0L), `4` = c(0L, 0L, 0L), `5` = c(0L, 0L, 0L), `6` = c(0L, 
0L, 0L), `7` = c(0L, -1L, 0L), `8` = c(0L, 0L, 0L), `9` = c(0L, 
0L, 0L), `10` = c(0L, 0L, 0L), `11` = c(0L, 0L, 0L), `12` = c(0L, 
0L, 0L), `13` = c(0L, 0L, 0L), `14` = c(0L, 0L, 0L), `15` = c(0L, 
0L, 0L), `16` = c(0L, 0L, 0L), `17` = c(0L, 0L, 0L), `18` = c(0L, 
0L, 0L), `19` = c(0L, 0L, 0L), `20` = c(0L, 0L, 0L), `21` = c(0L, 
0L, 0L), `22` = c(0L, 0L, 0L), `23` = c(0L, 0L, 0L), `24` = c(-1L, 
0L, 0L)), .Names = c("1", "2", "3", "4", "5", "6", "7", "8", 
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", 
"20", "21", "22", "23", "24"), row.names = c(3L, 6L, 12L), class = "data.frame")

This has 24 columns and each column represent statement. The second data frame is as follows:

structure(list(Level = c(1L, 1L, 1L, 1L), Statement = c("attr1", 
"attr2", "attr24", "attr7"), StmtNo = c(1L, 2L, 24L, 7L)), .Names = c("Level", 
"Statement", "StmtNo"), row.names = c(NA, 4L), class = "data.frame")

This 2nd data frame has a column by name StmtNo. This is the corresponding number for the column in df1. For e.g.., StmtNo 1 in DF2, matches with column Number 1 of DF1.

What I am trying to do is:

For all the cells with values 0 in DF1, I have to match the column numbers in DF1 with the StmtNo column of DF2. If the column number matches, the cell value should be 0, if it doesn't match, then the value should be NA. I tried using apply with if condition:

df1 <- apply(df1, function(x) if (x == 0) {
    if (exists(colnames(df1)) %in% df2$StmtNo) {
        x == NA
    } else {
        x == 0
    } 
})

But this returns a logical list. My desired output is as follows:

   1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
3  0  1 NA NA NA NA  0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA -1
6  1  0 NA NA NA NA -1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  0
12 1 -1 NA NA NA NA  0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  0

While the above might seem a lot uniform in terms of filled in NA's, I have 60 such files and each with different columns where NA's need to be filled in.

2
Do you mean that all the values in StmtNo column should be kept as it is in df1 and rest if zeroes should be changed to NA ?Ronak Shah
No, the values in df2$StmtNo matches column numbers in df1. So if the cell value in DF1 for a particular column is Zero, then I need to check if that particular column number is available in df2$StmtNo. If it is available then I have to leave it as zero, and if it isn't available then I need to change it to NA. The desired out put that I have given, has NA for Column 3 in DF1. This means 3 is not available in df2$StmtNo.LeArNr
yes, df2$StmtNo has values 1, 2, 24, 7 and those columns have retained their values in df1 and rest all have transformed to NARonak Shah

2 Answers

3
votes

Here's an attempt using some indexing of a data.frame object. The basic selection returns this:

!names(df1)[col(df1)] %in% df2$StmtNo & df1==0
#       1     2    3    4    5    6     7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23    24
#3  FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
#6  FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
#12 FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE

Which means that you can do:

df1[!names(df1)[col(df1)] %in% df2$StmtNo & df1==0] <- NA
df1

#   1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
#3  0  1 NA NA NA NA  0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA -1
#6  1  0 NA NA NA NA -1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  0
#12 1 -1 NA NA NA NA  0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  0
1
votes

A less-than-elegant brute force approach

cols <- names(df1)[!names(df1) %in% df2$StmtNo]
df <- data.frame( matrix(NA, ncol = length(cols), nrow = 3) )
names(df) <- cols
df <- cbind(df, df1[, df2$StmtNo])

df[, order(as.numeric(names(df)))]

#    1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
# 3  0  1 NA NA NA NA  0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA -1
# 6  1  0 NA NA NA NA -1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  0
# 12 1 -1 NA NA NA NA  0 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  0