I have a data frame with multiple entries in multiple cells. There are three kinds of columns: Those that only contain 1/0, those that contain 1/0 and some other entries, and those that does not contain 1/0.
What I want to do is for all the columns that contain some other value (usually two or more entries) to be split into x new columns with the column name + the value in the cell for each unique value in the column, and 1/0 as whether or not this is present. All columns with only 1/0 will be left as is.
Note: My original data frame is bigger and have many columns. Also, the content in the cells may vary depending on the data frame, and I would like it to work no matter what/how many entries are in the cells. Also note that I have columns that I don't want to be split, either because they only contain 1/0 (f. ex. emrY), or because they contain other data (f. ex. T_CIP).
The data frame:
structure(list(id = 1:10, emrA = c("I219V, T286A", "I219V", "I219V",
"I219V", "I219V", "R164H, I219V", "R164H, I219V", "R164H, I219V",
"R164H, I219V", "R164H, I219V"), gyrA_8 = c("S83L,678E", "D87N",
"S83L,252G", "S83L,678E", "S83L,678E", "S83L,828T", "S83L,828T",
"S83L,828T", "S83L,828T", "S83L,828T"), emrY = c("0", "1", "1",
"1", "1", "1", "1", "1", "1", "1"), T_CIP = c(0.25, 0.12, 0.12,
0.25, 0.25, 0.5, 2, 1, 1, 2)), .Names = c("id", "emrA", "gyrA_8",
"emrY", "T_CIP"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-10L))
What it looks like:
id emrA gyrA_8 emrY T_CIP
1 I219V, T286A S83L,678E 0 0.25
2 I219V D87N 1 0.12
3 I219V S83L,252G 1 0.12
4 I219V S83L,678E 1 0.25
5 I219V S83L,678E 1 0.25
6 R164H, I219V S83L,828T 1 0.5
7 R164H, I219V S83L,828T 1 2
8 R164H, I219V S83L,828T 1 1
9 R164H, I219V S83L,828T 1 1
10 R164H, I219V S83L,828T 1 2
What I want to end up with:
id emrA_I219V emrA_T286A emrA_R164H gyrA_8_S83L gyrA_8_678E gyrA_8_D87N gyrA_8_252G gyrA_8_828T emrY T_CIP
1 1 1 0 1 1 0 0 0 0 0.25
2 1 0 0 0 0 1 0 0 1 0.12
3 1 0 0 1 0 0 1 0 1 0.12
4 1 0 0 1 1 0 0 0 1 0.25
5 1 0 0 1 1 0 0 0 1 0.25
6 1 0 1 1 0 0 0 1 1 0.5
7 1 0 1 1 0 0 0 1 1 2
8 1 0 1 1 0 0 0 1 1 1
9 1 0 1 1 0 0 0 1 1 1
10 1 0 1 1 0 0 0 1 1 2
The emrY column was not split becasue it only contains 1/0. The T_CIP (and other similar columns to it) was not split because it contains other data.
Is there a way to do this with tidyverse-packages?
EDIT:
I do not feel that the question marked as duplicate answer my question - They do not have multiple columns with varied content, and the question itself is directly about dummy variables, and doesn't seem to explain what I am trying to do here.