1
votes

I'm trying to convert from wide to long format using tidyr to gather multiple columns with different types. Essentially the same thing as the question asked here. I'm new to R and not familiar with the syntax so may be making some obvious mistake.

My data looks like this:

ID    X_1_abc  X_1_xyz    X_2_abc    X_2_xyz   X_3_abc   X_3_xyz
1       1        1          2          2         2         1
2       1        2          1          0         1         NA 
3       1        2          1          1         NA        0

I've tried the following code:

df %<>% gather(var, val, X_1_abc:X_3_xyz) %>%   
  separate(var, c('var', 'X_number'), sep = 'X_*_', convert = TRUE) %>% 
  spread(var, val, convert = TRUE) 

but this gives me just two columns: one listing the variables and one listing the values.

ID  X_num `<NA>`
1   1_abc  1     
1   1_xyz  1     
1   2_abc  2     
1   2_xyz  2     
1   3_abc  2     
1   3_xyz  1 

I'm looking for the following:

ID   X_num   abc   xyz
1     1       1     1    
1     2       2     2
1     3       2     1
1
Take your output, then mutate(ID = parse_number(X_num), col = str_sub(X_num, end = -3)) %>% spread(col, <NA>`)Jon Spring
I don't think this will work because for the full data set not all of the variables have a 3 letter suffix. Right now if I try, I get the following error: Error: var` must evaluate to a single number or a column name, not a function`.AstraOK
Good point. There's probably a better way with regex, but Mauritz' answer with tidyr::separate is cleaner.Jon Spring

1 Answers

2
votes

I assume your expected output is incomplete as I don't see any entries for ID = 2 and ID = 3.

You could do the following

df %>%
    gather(k, v, -ID) %>%
    separate(k, into = c("tmp", "X_num", "ss"), sep = "_") %>%
    select(-tmp) %>%
    spread(ss, v)
#  ID X_num abc xyz
#1  1     1   1   1
#2  1     2   2   2
#3  1     3   2   1
#4  2     1   1   2
#5  2     2   1   0
#6  2     3   1  NA
#7  3     1   1   2
#8  3     2   1   1
#9  3     3  NA   0