I have a data frame with a column of factors and columns with values for each factor plus additional factors that are no longer included in the data frame. Example:
x <- data.frame(f= toupper(sample(letters[1:3], 5, replace=T)),
x.A = seq(1:5),
x.B = seq(1:5),
x.C = seq(1:5),
x.D = seq(1:5),
x.E = seq(1:5))
Resulting in:
f x.A x.B x.C x.D x.E
1 B 1 1 1 1 1
2 B 2 2 2 2 2
3 A 3 3 3 3 3
4 C 4 4 4 4 4
5 A 5 5 5 5 5
Now I want to remove all columns that do not represent a current level in column f, resulting in a data frame:
f x.A x.B x.C
1 B 1 1 1
2 B 2 2 2
3 A 3 3 3
4 C 4 4 4
5 A 5 5 5
Naming convention is consistent among levels and column names, and names always take the form somevariable.FACTORLEVEL. I would type all the names in a list to choose from, but it gets long and unwieldy. I tried using grep as follows:
subX <- x[x$f == 'B', grep('B', names(x))]
But don't quite get what I want and don't know how to extend that over all levels if it did work. I also looked at previous questions here and here, but they don't go as far as I need. Any help would be appreciated. Thanks.