I have some data which represents repeated (session A and B) measurements (var.x) of patients (id). My data is tidy (i.e. long-form table with one variable in each column and one observation in each row) ... like so:
| id | var.1 | var.2 | session |
--------------------------------
| 1 | 1.1 | 11 | A |
| 1 | 1.9 | 12 | B |
| 2 | 1.2 | 15 | A |
| 2 | 1.4 | 14 | B |
I want to do a correlation analysis of paired (by id) data (e.g. var.1) from session A and B. In other words, what is the correlation coefficient between repeated measurements..
This is not difficult to do, if I tranpose/pivot/melt/spread the data into a wide format like so:
| id | var.1.A | var.2.A | var.1.B | var.2.B |
----------------------------------------------
| 1 | 1.1 | 11 | 1.9 | 12 |
| 2 | 1.2 | 15 | 1.4 | 14 |
...then I can simply cor(var.1.A, var.1.B). And I realize that dplyr/tidyr have gather/spread/separate/unite commands for that purpose, however this seems untidy and less than elegant to me as I basically have to copy my data.
Is there a way to do this using dplyr, without reshaping the data?
I'm thinking something along the lines of:
data %>%
select(id, var.1, session) %>%
do( cor( filter(session=='A', filter(session=='B'))
...but obviously this doesn't work and I also need to somehow specify, that the data is paired (by id).
data %>% select(id, var.1, session) %>% do({data.frame(Cor=cor(.$var.1[.$session=='A'], .$var.1[.$session=='B']))})
If you need to do this by 'id', usegroup_by
afterselect
– akrundata.table
setDT(data)[, Map(function(x,y) cor(x[y=='A'], x[y=='B']), .SD[, c('var.1', 'var.2'), with=FALSE], list(session)) ]
– akrun