I have a set of rows and columns of data of posts and comments on Reddit where a row represents a post and it's comment. Since one post could contain multiple comments, I have rows with the same id (post id) and different comment ids. I want to merge the rows with the same id for one row and have all the different comment ids in the column - 'comment id' separated by commas. But also since the post data (title, body etc.) are duplicated (as shown in the attached image), I don't need them merged as only one occurrence per row.
![duplicate rows][1] 1
I could merge the comment information for relevant columns separated by commas but I don't know how to get the one occurrence of the duplicated post information which does not need merging.
all_reddits <- all_posts_and_comments %>%
group_by(id) %>%
summarise(
comment_id = paste(comment_id, collapse=","),
comment_author = paste(comment_author, collapse = ","),
comment_body = paste(comment_body, collapse = ","),
comment_score = paste(comment_score, collapse = ","),
comment_created_date = paste(comment_created_date, collapse = ","),
comment_link = paste(comment_link, collapse=",")
)
I have tried summarise_all() and summarise_at() of R: dplyr but I keep getting errors.