Add repeated-measures column names as prefix instead of suffix using reshape

Question

I am using the reshape function in base R to turn a long-format dataframe for a repeated measures design into a wide-format. See the toy dataset below. Questions 1,2, and 3 are individual responses on a three-item survey. There are four participants who each take the survey four times.

Q1 <- c(2,6,5,4,3,8,9,2,1,5,4,7,3,7,2,1)
Q2 <- c(4,7,6,3,1,2,5,6,7,5,4,3,5,6,6,3)
Q3 <- c(7,9,3,1,5,3,7,5,3,3,5,7,8,9,9,3)
Participant <- rep(c("Bob","Sue","Jim","Tom"), times = 1, each = 4)
Time <- rep(c("FirstSurvey","SecondSurvey","ThirdSurvey","FourthSurvey"), times = 4)

m <- as.data.frame(cbind(Participant, Time, Q1, Q2, Q3))

This yields the following dataframe

m

   Participant         Time Q1 Q2 Q3
1          Bob  FirstSurvey  2  4  7
2          Bob SecondSurvey  6  7  9
3          Bob  ThirdSurvey  5  6  3
4          Bob FourthSurvey  4  3  1
5          Sue  FirstSurvey  3  1  5
6          Sue SecondSurvey  8  2  3
7          Sue  ThirdSurvey  9  5  7
8          Sue FourthSurvey  2  6  5
9          Jim  FirstSurvey  1  7  3
10         Jim SecondSurvey  5  5  3
11         Jim  ThirdSurvey  4  4  5
12         Jim FourthSurvey  7  3  7
13         Tom  FirstSurvey  3  5  8
14         Tom SecondSurvey  7  6  9
15         Tom  ThirdSurvey  2  6  9
16         Tom FourthSurvey  1  3  3

If you then reshape it thus:

mReshaped <- reshape(m, idvar = "Participant", timevar = "Time", direction = "wide", sep = "", new.row.names = c(1,2,3,4))

it yields the following wide-format dataframe:

mReshaped

  Participant Q1FirstSurvey Q2FirstSurvey Q3FirstSurvey Q1SecondSurvey Q2SecondSurvey
1         Bob             2             4             7              6              7
2         Sue             3             1             5              8              2
3         Jim             1             7             3              5              5
4         Tom             3             5             8              7              6
  Q3SecondSurvey Q1ThirdSurvey Q2ThirdSurvey Q3ThirdSurvey Q1FourthSurvey Q2FourthSurvey
1              9             5             6             3              4              3
2              3             9             5             7              2              6
3              3             4             4             5              7              3
4              9             2             6             9              1              3
  Q3FourthSurvey
1              1
2              5
3              7
4              3

With the following column names

colnames(mReshaped)

 [1] "Participant"    "Q1FirstSurvey"  "Q2FirstSurvey"  "Q3FirstSurvey"  "Q1SecondSurvey"
 [6] "Q2SecondSurvey" "Q3SecondSurvey" "Q1ThirdSurvey"  "Q2ThirdSurvey"  "Q3ThirdSurvey" 
[11] "Q1FourthSurvey" "Q2FourthSurvey" "Q3FourthSurvey"

As you can see when the dataframe is reshaped the reshape function adds the time variable as a suffix to the column name for each repeated measure.

Does anyone know if there is an argument in the reshape function to allow you to choose to put the Time variable as prefix, in front of each Value variable name?

eipi10 eipi10 · Accepted Answer · 2015-09-11T22:53:05

I'm not sure whether you can change the order within reshape, but you can change it afterwards using gsub with a Regular Expression:

names(mReshaped) = gsub("(Q[0-9])(.*)", "\\2\\1", names(mReshaped))

 [1] "Participant"    "FirstSurveyQ1"  "FirstSurveyQ2"  "FirstSurveyQ3"  "SecondSurveyQ1"
 [6] "SecondSurveyQ2" "SecondSurveyQ3" "ThirdSurveyQ1"  "ThirdSurveyQ2"  "ThirdSurveyQ3" 
[11] "FourthSurveyQ1" "FourthSurveyQ2" "FourthSurveyQ3"

UPDATE: Explanation of how the code works: The code uses a Regular Expression (or "regex" for short), which is a text processing language that's very cryptic the first time you see it.

In this case Q[0-9] means match a "Q" followed by any digit. (Q[0-9]) turns that match into a "capture group" meaning we can refer back to it later. This is capture group #1.

.* means match all remaining characters (anything that comes after whatever is matched by Q[0-9]). . means match any single character; adding * means match any string of characters of any length. (.*) turns the match into capture group #2.

\\2\\1 takes the two strings we captured and reverses their order.

Regular expressions can be very useful for text manipulation tasks like this. A few places to learn more about them are here, here, and here

Add repeated-measures column names as prefix instead of suffix using reshape

1 Answers