I have a dataframe, in which one of the columns contains strings, separated by commas. I was wondering if there is an efficient way to make these comma-separated values into new column headers, and make these new column values binary if they were part of the original row. A sample of my data can be reproduced below:
data <- structure(list(id = c(6901257L, 6304928L, 7919400L), amenities =
c("Wireless Internet,Air conditioning,Kitchen,Heating,Family/kid
friendly,Essentials,Hair dryer,Iron,translation missing:
en.hosting_amenity_50", "Wireless Internet,Air
conditioning,Kitchen,Heating,Family/kid friendly,Washer,Dryer,Smoke
detector,Fire extinguisher,Essentials,Shampoo,Hangers,Hair
dryer,Iron,translation missing: en.hosting_amenity_50", "TV,Cable
TV,Wireless Internet,Air
conditioning,Kitchen,Breakfast,Buzzer/wireless
intercom,Heating,Family/kid friendly,Smoke detector,Carbon monoxide
detector,Fire extinguisher,Essentials,Shampoo,Hangers,Hair
dryer,Iron,Laptop friendly workspace,translation missing:
en.hosting_amenity_50" )), .Names = c("id", "amenities"), class =
"data.frame", row.names = c(NA, 3L))
I have an inefficient way of producing my results, which is to make the data into a long format, and then use dcast in reshape2. This inefficient method can be reproduced by:
data.long <- data %>%
mutate(amenities = strsplit(as.character(amenities), ",")) %>%
unnest(amenities)
data.long$amenities.value <- 1
data.wide <- reshape2::dcast(data.long, id ~ amenities, value.var =
"amenities.value") #desired result
Is there a more efficient way to get the desired result from the original data structure?