Having an issue with how to dummy code the following dataset.
Example data, lets say dataframe = mydata:
ID | NAMES |
-- | -------------- |
1 | 4444, 333, 456 |
2 | 333 |
3 | 456, 765 |
I'd like to cast only the unique variables in NAMES as column variables and code if each row has that variable or not i.e 1 or 0
Desired Output:
ID | NAMES | 4444 | 333 | 456 | 765 |
-- | -------------- |------|-----|-----|-----|
1 | 4444, 333, 456 | 1 | 1 | 1 | 0 |
2 | 333 | 0 | 1 | 0 | 0 |
3 | 456, 765 | 0 | 0 | 1 | 1 |
what I've done so far is created a vector of unique
split <- str_split(string = mydata$NAMES,pattern = ",")
vec <- unique(str_trim(unlist(split)))
remove <- ""
vec <- as.data.frame(vec[! vec %in% remove])
colnames(vec) <- "var"
vecRef <- as.vector(vec$var)
namesCast <- dcast(data = vec,formula = .~var)
namesCast <- nameCast[,2:ncol(namesCast)]
This yields a vector of unique NAMES with spaces/irregularities removed. From there I have no idea how to do the matching/dummy coding so any help would be greatly appreciated!