I have a set of data that comes out of Matlab, and I want to use it in R. I have a set of subjects, and a set of conditions within each subject. In each condition, each subject produced some data. I wrote this into a "tall" table, like so:
subject condition data
#1 id1 cond1 0.12
#2 id1 cond1 0.43
#3 id1 cond2 1.26
#4 id2 cond1 1.96
#5 id2 cond2 0.24
#6 id2 cond2 0.62
...
As you can see, an issue is that there are not the same number of values in each condition for each subject, and there are not the same number of values in each condition within subjects, either. I'm interested in the distributions of these variables between subjects, so I was hoping to retain raw values in a list in a "wide" data frame, like this:
subject condition data
#1 id1 cond1 c(0.12, 0.43)
#2 id1 cond2 c(1.26)
#3 id2 cond1 c(1.96)
#4 id2 cond2 c(0.24, 0.62)
...
What is the best way of doing this? I have used tidyr::spread() in the past, which does not work here without a unique identifying variable per row, but even if I added then I don't see how it would work.
I also tried using dplyr::group_by(data, subject, condition), but I'm not sure how to proceed from there. Would it be possible to summarise the grouped table by using c() as summary function...? This hasn't worked for me.
As always, thanks for any help!
dplyr::group_by
thendplyr::summarize()
to do whatever you want to do to look at "the distributions of these variables between subjects". Please tell us your end goal, rather than just an intermediate step that you think is necessary (but which really might just be overcomplicating a simple problem).. – Gregor Thomas