1
votes

Given this data:

x <- c(1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 4) 
y <- c('Name', 'Street', 'Gender', 'Name', 'Street', 'Name', 'Street', 'Street', 'Dateofbirth', 'Gender','Name') 
z <- c('Jasper', 'Broadway', 'Male', 'Alice', 'Narrowstreet', 'Peter', 'Neverland', 'Treasureisland', '1841', 'Male','Martin')

k <- data.frame(id = x, key = y, value = z)

I would like to create a clean 4-column table that has has keys as headers (i.e. Name, Street, Gender and Date of birth). The problem here is that the key 'Street' is double for Peter. I've tried to use spread (tidyr) but I haven't managed to make it work.

k <- k %>% group_by(id) %>%
           mutate(index = row_number()) %>%
           spread(key, value)

I also gave a shot to:

k <- k %>% group_by(id) %>%
  mutate(index = row_number()) %>%
  spread(id, value) 

The result is not what I was expecting and both tables are quite difficult to work with. Any ideas?

1
It's not clear what you want to be done with those doubled values... One option (using reshape2 instead of tidyr) looks like dcast(k, id ~ key, fun=toString)Frank
Good point. Ideally, I would like to keep them both and insert them in separate columns. As a second best option, I could just keep the first. I will modify the question if you think it isn't clear enough.Jasper
Yeah, I think it would be good to edit that in. It's fine to list both options, in my opinion.Frank
Just tried that out and it works! You can post it as an answer. Cheers!Jasper
Sure. By the way, besides editing your question to include the info from your last comment, I think you ought to improve the title ("spreading repeated values" or something?)Frank

1 Answers

2
votes

Don't know if this is exactly what you are looking for, but if you just want to keep the first, you can group_by(id,key) and summarise value using first. Then, regroup by id and spread:

library(dplyr)
library(tidyr)
k <- k %>% group_by(id, key) %>% summarise(value=first(value)) %>% group_by(id) %>% spread(key,value)    
##Source: local data frame [4 x 5]
##Groups: id [4]
##
##     id Dateofbirth Gender   Name       Street
##* <dbl>      <fctr> <fctr> <fctr>       <fctr>
##1     1          NA   Male Jasper     Broadway
##2     2          NA     NA  Alice Narrowstreet
##3     3        1841   Male  Peter    Neverland
##4     4          NA     NA Martin           NA

To put the doubled values in separate columns, use make.names to create unique keys:

k <- k %>% group_by(id) %>% mutate(key=make.names(key,unique=TRUE)) %>% group_by(id) %>% spread(key,value)
##Source: local data frame [4 x 6]
##Groups: id [4]
##
##     id Dateofbirth Gender   Name       Street       Street.1
##* <dbl>      <fctr> <fctr> <fctr>       <fctr>         <fctr>
##1     1          NA   Male Jasper     Broadway             NA
##2     2          NA     NA  Alice Narrowstreet             NA
##3     3        1841   Male  Peter    Neverland Treasureisland
##4     4          NA     NA Martin           NA             NA

Alternatively, you can group_by(id,key) and summarise value using toString or paste with collapse to flatten the doubled values:

k <- k %>% group_by(id, key) %>% summarise(value=toString(value)) %>% group_by(id) %>% spread(key,value)
##Source: local data frame [4 x 5]
##Groups: id [4]
##
##     id Dateofbirth Gender   Name                    Street
##* <dbl>       <chr>  <chr>  <chr>                     <chr>
##1     1        <NA>   Male Jasper                  Broadway
##2     2        <NA>   <NA>  Alice              Narrowstreet
##3     3        1841   Male  Peter Neverland, Treasureisland
##4     4        <NA>   <NA> Martin                      <NA>