1
votes

I have a 5500 row, 15 column dataframe where columns 5:15 are logical. For each row, TRUE only appears in one of those columns 5:15. I've replaced TRUE with a string of the name of the variable I was detecting.

So what I want to do now is combine columns 5:15 into a single column, keeping the row index. I think essentially if I could multiply columns 5:15 as though they are a matrix, ignoring the FALSE values and only keeping the string, that would be great.

I've tried mutate, cbind, rbind, and union, but they don't work for this instance- I want to keep the rows matched up.

Here is a minmal example, with just three logical columns.

object number word value var1 var2 var3
2 word1 5 FALSE FALSE var3
4 word2 22 FALSE var2 FALSE
9 word3 45 var1 FALSE FALSE
#this is the R copy-paste table
mydata <- data.table(structure(list(object_number = c(2L, 4L, 9L), word = c("word1","word2", "word3"), value = c(5L, 22L, 45L), var1 = c("FALSE","FALSE", "var1"), var2 = c("FALSE", "var2", "FALSE"), var3 = c("var3","FALSE", "FALSE")), row.names = c(NA, -3L), class = c("data.table","data.frame")))

And what I want to achieve is below: An added column "names" that only has the variable names in, but keeps the data from the first 3 columns the same.

object number word value var1 var2 var3 names
2 word1 5 FALSE FALSE var3 var3
4 word2 22 FALSE var2 FALSE var2
9 word3 45 var1 FALSE FALSE var1

Any help would be much appreciated, even if you can just point me to the right function or package I need. Thank you!

2

2 Answers

1
votes

This is easily achieved. You just add the column via an apply() command like this

mydata$names <- apply(mydata, 1, function(x) x[x != FALSE])

with output

    mydata
   object_number  word value  var1  var2  var3 names
1:             2 word1     5 FALSE FALSE  var3  var3
2:             4 word2    22 FALSE  var2 FALSE  var2
3:             9 word3    45  var1 FALSE FALSE  var1
1
votes

We could use a vectorized option with row/column indexing

mydata[, names := as.data.frame(.SD)[cbind(seq_len(.N),
  max.col(.SD != 'FALSE', 'first'))], .SDcols = patterns('^var\\d+$')]
mydata
#   object_number  word value  var1  var2  var3 names
#1:             2 word1     5 FALSE FALSE  var3  var3
#2:             4 word2    22 FALSE  var2 FALSE  var2
#3:             9 word3    45  var1 FALSE FALSE  var1