2
votes

I want to set the value based on specific matching of rowname and colname in an R data frame. I have the row names (var1, var2, var3, var4 etc.) and the column names (x-var1-t1, x-var2-t1, x-var1-t4, x-var3-t1, x-var3-t7 etc). The row name needs to match the first "x-varN" part of the column name. For example, var1 row name should match with x-var1-t1 and x-var1-t4.

So this data frame:

      x-var1-t1   x-var2-t1   x-var1-t4   x-var3-t1   x-var3-t7
var1          0           0           0           0           0
var2          0           0           0           0           0
var3          0           0           0           0           0
var4          0           0           0           0           0

would change to this:

      x-var1-t1   x-var2-t1   x-var1-t4   x-var3-t1   x-var3-t7
var1          1           0           1           0           0
var2          0           1           0           0           0
var3          0           0           0           1           1
var4          0           0           0           0           0

What's the best way to perform this function?

2

2 Answers

2
votes

We can use sapply to loop through rownames of df and use grepl to check which column has that row name and convert the value to 1 for those.

df[] <- t(sapply(rownames(df), function(x) as.numeric(grepl(x, colnames(df)))))
df

#     x.var1.t1 x.var2.t1 x.var1.t4 x.var3.t1 x.var3.t7
#var1         1         0         1         0         0
#var2         0         1         0         0         0
#var3         0         0         0         1         1
#var4         0         0         0         0         0

Or as suggested by @Dan Y we can skip the anonymous call and make this more compact by:

df[] <- +t(sapply(rownames(df), grepl, colnames(df)))
2
votes

We can use adist to compare the rownames to columnnames.

 dat[] = +(!do.call(adist, c(partial = TRUE, dimnames(dat))))
 dat
     x.var1.t1 x.var2.t1 x.var1.t4 x.var3.t1 x.var3.t7
var1         1         0         1         0         0
var2         0         1         0         0         0
var3         0         0         0         1         1
var4         0         0         0         0         0

This is equivalent to:

  (adist(rownames(dat),colnames(dat),partial=TRUE)==0)+0

The reason I am adding 0 is to change it from logical to numeric. You can use *1. These are just identities. adist with partial=TRUE is equivalent with agrep.