0
votes

I have a data.frame:

set.seed(1L)
vector <- data.frame(patient=rep(1:5,each=2),medicine=rep(1:3,length.out=10),prob=runif(10))

I want to get the mean of the "prob" column while grouping by patient. I do this with the following code:

vector %>%
    group_by(patient) %>%
    summarise(average=mean(prob))

This code perfectly works. However, I need to get the same values without using the word "prob" on the "summarise" line. I tried the following code, but it gives me a data.frame in which the column "average" is a vector with 5 identical values, which is not what I want:

vector %>%
        group_by(patient) %>%
        summarise(average=mean(vector[,3]))

PD: for the sake of understanding why I need this, I have another data frame with multiple columns with complex names that need to be "summarised", that's why I can't put one by one on the summarise command. What I want is to put a vector there to calculate the probs of each column grouped by patients.

1
May I propose to first bring the data into the most convenient format for further processing? Your last comment hints that melting the data first and then applying the working code you have presented already may be a promising approach.Peter Lustig
I think it's currently in the works, linked to the lazy packagebaptiste
Thanks Peter, this actually solved my problem! I melted and "dcasted" the data frame.Victor

1 Answers

4
votes

It appears you want summarise_each

vector %>%
    group_by(patient) %>%
    summarise_each(funs(mean), vars= matches('prop'))

Using data.table you could do

setDT(vector)[,lapply(.SD,mean),by=patient,.SDcols='prob')