0
votes

I need to cast a long data format (long) to a wide format (wide) with the following conditions (if possible):

1) All data files will be in long form (long) with the same structure (id, name, value), but each data file will have different variables, values, and number of variables:

id = case
name = variable
value = variable value(s)

2) Each data file will be a different mixture of variables (factors, integers, numeric). Some factors might have multiple levels per case (fruit and meat in long) which I would like to create a separate dummy variable (logical) for each level within these factors. The number of factor and numeric variables will vary by data file.

3) Given the variables will be different for every data file, I was hoping to automate it where I can apply the same code to every data file without changing any variable names.

I've tried reshape2 and tidyr but can't figure out a way to get it done.

This is the long format:

    long
   id   name     value
1   1  fruit     apple
2   1  fruit    banana
3   1  fruit    orange
4   1  fruit pineapple
5   1   meat     steak
6   1   meat   chicken
7   1  fname      dave
8   1     wt       185
9   1 status    active
10  2  fruit     apple
11  2  fruit pineapple
12  2   meat   chicken
13  2  fname      jeff
14  2     wt       205
15  2 status    active
16  3  fruit     apple
17  3  fruit    banana
18  3   meat     steak
19  3  fname      jane
20  3     wt       125
21  3 status    lapsed

This is the wide format I would prefer:

wide
  id fruit.apple fruit.banana fruit.orange fruit.pineapple meat.steak meat.chicken fname  wt status
1  1        TRUE         TRUE         TRUE            TRUE       TRUE         TRUE  dave 185 active
2  2        TRUE        FALSE        FALSE            TRUE      FALSE         TRUE  jeff 205 active
3  3        TRUE         TRUE        FALSE           FALSE       TRUE        FALSE  jane 125 lapsed

Long Format Data:

long <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), name = c("fruit", 
"fruit", "fruit", "fruit", "meat", "meat", "fname", "wt", "status", 
"fruit", "fruit", "meat", "fname", "wt", "status", "fruit", "fruit", 
"meat", "fname", "wt", "status"), value = c("apple", "banana", 
"orange", "pineapple", "steak", "chicken", "dave", "185", "active", 
"apple", "pineapple", "chicken", "jeff", "205", "active", "apple", 
"banana", "steak", "jane", "125", "lapsed")), .Names = c("id", 
"name", "value"), class = "data.frame", row.names = c(NA, -21L
))
1
Row 9 of the input should be id == 1, right?Frank
Correct - my bad. I need to fix that,,,DRcod

1 Answers

0
votes

A solution uses dplyr and tidyr.

library(dplyr)
library(tidyr)

wide <- long %>%
  mutate(value2 = ifelse(name %in% c("fruit", "meat"), "1", value),
         name2 = ifelse(name %in% c("fruit", "meat"), 
                       paste(name, value, sep = "."), name)) %>%
  select(-name, -value) %>%
  spread(name2, value2, fill = "0") %>%
  mutate_at(vars(matches("fruit|meat")), as.numeric) %>%
  mutate_at(vars(matches("fruit|meat")), as.logical)