5
votes

I'm looking for a way to add a column to my data table that consists of residuals from a lm(a~b) function computed separately for different levels of c

I've been suggested to look into sort_by(c) function but that doesn't seem to work with lm(a~b)

My working example data looks like this:

outcome data frame

Columns subject, trial and rt are within a data.frame, my goal is to compute Zre_SPSS (that I originally made in SPSS) but from a R function.

I've tried

data %<>% group_by (subject) %>% 
  mutate(Zre=residuals(lm(log(rt)~trial)))

but it doesn't work - Zre gets computed but not within each subject separately, rather for the entire data frame.

Anyone could please help me? I'm a complete R (and coding in general) newbie, so please forgive me if this question is stupid or a duplicate, chances are I didn't understand other solutions or they where not solutions I looked for. Best regards.

As per Ben Bolker request here is R code to generate data from excel screen shot

#generate data
  subject<-c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3)
  subject<-factor(subject)
  trial<-c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6)
  rt<-c(300,305,290,315,320,320,350,355,330,365,370,370,560,565,570,575,560,570)

#Following variable is what I would get after using SPSS code
  ZreSPSS<-c(0.4207,0.44871,-1.7779,0.47787,0.47958,-0.04897,0.45954,0.45487,-1.7962,0.43034,0.41075,0.0407,-0.6037,0.0113,0.61928,1.22038,-1.32533,0.07806)

#make data frame
  sym<-data.frame(subject, trial, rt, ZreSPSS)
2
is there any chance you could post your example in a text/cut-and-pasteable form rather than as a screenshot?Ben Bolker
of course. I attached code for generating data from the screen shotblazej
You might want to look at tidyr::nest and a quick blog blurb.r2evans
This might help.Haboryme
A model with trial on the RHS, and not as a factor, doesn't seem to make much sense. Is this really the data you fit the SPSS model on?Hong Ooi

2 Answers

5
votes

It looks like a bug in dplyr 0.5's mutate, where lm within a group will still try to use the full dataset. You can use do instead:

sym %>% group_by(subject) %>% do(
{
    r <- resid(lm(log(rt) ~ trial, data = .))
    data.frame(., r)
})

This still doesn't match your SPSS column, but it's the correct result for the data you've given. You can verify this by fitting the model manually for each subject and checking the residuals.

(Other flavours of residuals include rstandard for standardized and rstudent for studentized residuals. They still don't match your SPSS numbers, but might be what you're looking for.)

1
votes

Later version of dplyr seems able to handle this (tested with dplyr 0.7.4):

sym %>% group_by(subject) %>% do(
{
    r <- resid(lm(log(rt) ~ trial, data = .))
    data.frame(., r)
}) ->a

sym %>% group_by(subject) %>% mutate(

    r =  resid(lm(log(rt) ~ trial))
) ->b

all(a$r==b$r)  #->TRUE

another independent test

# https://stackoverflow.com/a/40061201/2292993
# https://stackoverflow.com/q/24766450/2292993
# https://github.com/tidyverse/dplyr/issues/2177

# tested with dplyr 0.7.4

# 1) do 
df = group_by(iris,Species) %>% do({
res = resid( lm(Sepal.Length~Petal.Length+Petal.Width, data=.) )
data.frame(., res)
})

# 2) group_by + mutate
# cannot have "data=." in lm
df2 = group_by(iris,Species) %>% mutate(
res = resid( lm(Sepal.Length~Petal.Length+Petal.Width) )
)

# 3) filter + mutate
df3 = filter(iris,Species=='setosa') %>% mutate(
res = resid( lm(Sepal.Length~Petal.Length+Petal.Width, data=.) )
)
df3 = bind_rows(df3,
filter(iris,Species=='versicolor') %>% mutate(
res = resid( lm(Sepal.Length~Petal.Length+Petal.Width, data=.) )
))
df3 = bind_rows(df3,
filter(iris,Species=='virginica') %>% mutate(
res = resid( lm(Sepal.Length~Petal.Length+Petal.Width, data=.) )
))

# 4) across all rows (should not be the same)
df4 = mutate(iris,
res = resid( lm(Sepal.Length~Petal.Length+Petal.Width, data=iris) )
)

# conclusion: all the same, except df4
all(df$res==df2$res)
all(df$res==df3$res)
df$res==df4$res