R dplyr group_by subject appears to use entire dataframe instead of subject

Question

Background I am working with a large dataset from a repeated measures clinical trial in R, where I want to do some data manipulations for each subject. This could be extraction of the max value in column x for each subject or the mean of column y for each subject.

Problem

I am fond of using the dplyr package and pipes, which led me to the group_by function. But when I try to apply it, the data that I want to extract does not seem to group by subject as it is supposed to, but rather extracts data based on the entire dataset.

Code

This is what I have done so far:

data <- read.csv(file="group_by_question.csv", header=TRUE, sep=",")

library(dplyr)
library(plyr)

data <- tbl_df(data)

test <- data %>%
  filter(!is.na(wght)) %>%
  dplyr::group_by(subject_id) %>%
  mutate(maxwght=max(wght),meanwght=mean(wght)) %>%
  ungroup()

Sample of the test dataframe:

Find a .csv sample of my dataset here: https://drive.google.com/file/d/1wGkSQyJXqSswThiNsqC26qaP7d3catyX/view?usp=sharing

remove plyr from your work space and load only dplyr, as there is a lot of confilcts between them. — A. Suliman

DTYK DTYK · Accepted Answer · 2018-06-12T00:44:45

Is this what you want? In my example below, the output shows the max value for the maxwght column by subject id. You could replace max() with mean, for example, if you require the mean value for maxwght for each subject id.

library(dplyr)

data <- read.csv(file="group_by_question.csv", header=TRUE, sep=",")

test <- data %>%
    filter(!is.na(wght)) %>%
    mutate(maxwght=max(wght),meanwght=mean(wght)) %>%
    group_by(subject_id) %>%
    summarise(value = max(maxwght)) %>%
    ungroup()

R dplyr group_by subject appears to use entire dataframe instead of subject

1 Answers