2
votes

I'm looking to create a user-defined contrast on my data. In brief, the data is organized in a dataframe, with each row having 1 of 4 possible conditions, a proportion of correct answers on a test, and 2 variables called "Schedule" and "Cluster." The head of my data looks like this:

  Subjects Condition        PC    Schedule Cluster
1        1         1 0.5555556 Interleaved Similar
2        2         1 0.3425926 Interleaved Similar
3        3         1 0.7129630 Interleaved Similar
4        4         1 0.5000000 Interleaved Similar
5        5         1 0.6296296 Interleaved Similar
6        6         1 0.6851852 Interleaved Similar

There are two main contrasts I want to run. The first compares condition 1 to the mean of conditions 2, 3, and 4. The second compares condition 4 to the mean of conditions 2 and 3. I coded my two contrtasts like this:

contrast1 = c(1, -1/3, -1/3, -1/3)
contrast2 = c(0, -1/2, -1/2, 1)

I then put them into a matrix:

cond.contrasts = matrix(c(contrast1, contrast2), ncol = 2)

Per advice I saw elsewhere, I got the general inverse of this matrix with a function from the MASS package, ginv():

cond.contrasts = t(ginv(cond.contrasts))
show(cond.contrasts)
      [,1]       [,2]
[1,]  0.75  0.0000000
[2,] -0.25 -0.3333333
[3,] -0.25 -0.3333333
[4,] -0.25  0.6666667

Note there are only two contrasts here. However, my output looks like this:

    lm.experiment = lm(PC ~ Condition, PC)
    summary(lm.experiment)
    Call:
    lm(formula = PC ~ Condition, data = PC)

    Residuals:
     Min       1Q   Median       3Q      Max 
    -0.22099 -0.12069 -0.00926  0.11443  0.35117 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.5438470  0.0136786  39.759   <2e-16 ***
Condition1   0.0263110  0.0312175   0.843    0.401    
Condition2   0.0279084  0.0335882   0.831    0.408    
Condition3  -0.0007032  0.0276090  -0.025    0.980    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1472 on 112 degrees of freedom
Multiple R-squared:  0.01234,   Adjusted R-squared:  -0.01412 
F-statistic: 0.4663 on 3 and 112 DF,  p-value: 0.7064

If I'm understanding this right, my contrasts should be represented by the "Condition1" and "Condition2" coefficients. However, I have no idea what "Condition3" refers to. If I ask R to show me the contrasts directly, it gives me this:

> show(contrasts(PC$Condition))
   [,1]       [,2]          [,3]
1  0.75  0.0000000  8.326673e-17
2 -0.25 -0.3333333 -7.071068e-01
3 -0.25 -0.3333333  7.071068e-01
4 -0.25  0.6666667 -2.498002e-16

Where does the third column come from? Have I done something wrong?

1
How do you set the contrasts for Condition?Sven Hohenstein
You've got four means that are estimable in the model, so you should provide 4 linearly-independent contrasts, even if you only care about two of them.Andrew M
@AndrewM Could you explain why that's the case? Thanks!Brian
@SvenHohenstein Oops, thought I included that! contrasts(PC$Condition) = cond.contrastsBrian

1 Answers

1
votes

If you specify the contrasts outside the lm function, R will automatically use the maximum number of contrasts. In your example, one contrast is added since 4 factor levels allow for 3 orthogonal contrasts.

However, you can use the parameter contrasts in lm to override the default behavior. In this case, the specified contrast matrix is used. No additional contrasts are added.

The command:

lm(PC ~ Condition, PC, contrasts = list(Condition = cond.contrasts))

This means that you want to use the contrast matrix cond.contrasts for the factor Condition.