I have some data structured like this:
structure(list(subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("group1", "group2"), class = "factor"), measurement = c("color", "time", "color", "time", "color", "time", "color", "time", "color", "time", "color", "time", "color", "time", "color", "time", "color", "time", "color", "time", "color", "time", "color", "time", "color", "time", "color", "time", "color", "time", "color", "time"), item_pos = c("1", "1", "2", "2", "3", "3", "4", "4", "1", "1", "2", "2", "3", "3", "4", "4", "1", "1", "2", "2", "3", "3", "4", "4", "1", "1", "2", "2", "3", "3", "4", "4"), value = c("blue", "1508", "orange", "752", "black", "585", "red", "842", "red", "879", "white", "1455", "green", "1757", "orange", "2241", "white", "2251", "yellow", "1740", "red", "1962", "yellow", "1854", "green", "1859", "blue", "2156", "yellow", "2494", "green", "1757"), item = c("A", "A", "B", "B", "B", "B", "A", "A", "A", "A", "B", "B", "B", "B", "A", "A", "C", "C", "C", "C", "D", "D", "D", "D", "C", "C", "C", "C", "D", "D", "D", "D")), .Names = c("subject", "group", "measurement", "item_pos", "value", "item"), row.names = c(NA, -32L), class = "data.frame")
Which has multiple observations by subject by item, so the data for subject 1 looks like this:
> filter(df.tidy, subject==1)
subject group measurement item_pos value item
1 1 group1 color 1 blue A
2 1 group1 time 1 1508 A
3 1 group1 color 2 orange B
4 1 group1 time 2 752 B
5 1 group1 color 3 black B
6 1 group1 time 3 585 B
7 1 group1 color 4 red A
8 1 group1 time 4 842 A
So within a group
each item
appears twice, and for each occurrence there is a measurement
of color and time. The order in which items appear is in item_pos
.
While I like this long format, a colleague needs it slightly 'wider', with the repeated color and time measures in their own colums by item. The desired format would be as follows:
subject group item color1 color2 time1 time2
1 group1 A blue red 1508 842
1 group1 B orange black 752 585
...
4 group2 D yellow green 2494 1757
My feeling is that this ought to be possible using a combination of gather()
, spread()
and other dplyr verbs, but I'm not sure what the dplyr equivalent here would be for (in for-loop speak) looping through the items by group and collecting the color and time observations in subsequent columns. Help much appreciated!
Related questions I consulted: