Reshaping Data with Multiple Sets of Column Variables

Question

I am trying to reshape some of my data and haven't had any luck with the reshape2 package. Nothing I've tried is quite what I'm looking for given my data.

I have a data.frame that looks like this (where name1 corresponds to age1, and name2 corresponds to age2, etc):

city <- c("New York", "Philadelphia", "Chicago")
state <- c("NY", "PA", "IL")
name1 <- c("Tim", NA, NA)
name2 <- c("Bob", "Jim", "Bill")
name3 <- c(NA, NA, "Jeff")
age1 <- c(40, NA, NA)
age2 <- c(30, 29, 34)
age3 <- c(NA, NA, 27)

df <- data.frame(city, state, name1, name2, name3, age1, age2, age3)

          city state name1 name2 name3 age1 age2 age3
1     New York    NY   Tim   Bob  <NA>   40   30   NA
2 Philadelphia    PA  <NA>   Jim  <NA>   NA   29   NA
3      Chicago    IL  <NA>  Bill  Jeff   NA   34   27

In my actual data set, I have about 50 "name" and "age" variables each.

Ultimately, I'm aiming for 4 variables: city, state, name, age. I also don't want the NA's in my finished data set. So what I'm looking to achieve is this:

          city state name age
1     New York    NY  Tim  40
2     New York    NY  Bob  30
3 Philadelphia    PA  Jim  29
4      Chicago    IL Bill  34
5      Chicago    IL Jeff  27

Any help would be greatly appreciated! I've tried the reshape2 and tidyR packages, but no luck yet!

Thanks!

A5C1D2H2I1M1N2O1R2T1 A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2014-12-17T02:17:47

melt from "reshape2" presently can't handle this nicely since the "name" and "age" columns are of different types. However, a later version of "data.table" (which will also implement melt) will let you do such a transformation by supplying a list of the measure variables. (See Issue #828, which has the feature launching in V1.9.8 of "data.table".)

In the meantime, other than reshape in base R, you can also try merged.stack from my "splitstackshape" package:

library(splitstackshape)
na.omit(merged.stack(df, var.stubs = c("name", "age"), sep = "var.stubs"))
#            city state .id .t1 name age
# 1:      Chicago    IL   2   2 Bill  34
# 2:      Chicago    IL   3   3 Jeff  27
# 3:     New York    NY   1   1  Tim  40
# 4:     New York    NY   2   2  Bob  30
# 5: Philadelphia    PA   2   2  Jim  29

sep = "var.stubs" is a way of saying that there is no separator between the variable stub and the "time" of measurement.

If you also wanted to drop the unnecessary columns, you can compound that into the previous statement (or use subset as @MrFlick did).

na.omit(
  merged.stack(df, var.stubs = c("name", "age"), 
               sep = "var.stubs")[, c(
                 "city", "state", "name", "age"), with = FALSE])
#            city state name age
# 1:      Chicago    IL Bill  34
# 2:      Chicago    IL Jeff  27
# 3:     New York    NY  Tim  40
# 4:     New York    NY  Bob  30
# 5: Philadelphia    PA  Jim  29

Reshaping Data with Multiple Sets of Column Variables

2 Answers