0
votes

I am converting SAS PROC GENMOD models over to R using glm. When I compare the output for additive models the estimates match for the treatments. However, the estimates do not match when I run interactive models. The interaction estimate is the same, but the main effect estimates are different between R and SAS.

The SAS code for the additive model is:

proc genmod descending;
    class Grazed CP;
model Parasitized = Grazed CP / dist=bin link=logit type1;      
run;

The R code for the additive model is:

mod3 <- glm(Parasitized ~ Grazed + CP, family = binomial(link = logit), data = dick)

The SAS output for the additive model is:

           Analysis Of Maximum Likelihood Parameter Estimates
Parameter       DF  Estimate    SE       Wald Chi-Square    Pr > ChiSq
Intercept       1    -0.2976    0.2006      2.20             0.1379
Grazed  N   1         0.0696    0.2374      0.09             0.7695
Grazed  Y   0         0.0000    0.0000      .                .
CP  2       1         0.2829    0.2355      1.44             0.2298
CP  25      0         0.0000    0.0000      .                .
Scale       0         1.0000    0.0000          

The R summary output for the additive model is:

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  0.05486    0.19377   0.283    0.777
GrazedY     -0.06956    0.23743  -0.293    0.770
CP25        -0.28289    0.23555  -1.201    0.230

Now I know the sign is different between the estimates but I know why that is already and am not concerned with it.

The SAS code for the interactive model is:

proc genmod descending;
    class Grazed CP;
model Parasitized = Grazed CP Grazed*CP / dist=bin link=logit type1;        
run;

The R code for the interactive model is:

mod4 <- glm(Parasitized ~ Grazed * CP, family = binomial(link = logit), data = dick)

The summary output for the SAS interactive model is:

    Analysis Of Maximum Likelihood Parameter Estimates 
Parameter       DF   Estimate  SE      Wald Chi-Square  Pr > ChiSq 
Intercept       1    -0.3321   0.2281      2.12         0.1454 
Grazed N        1     0.1337   0.3105      0.19         0.6668 
Grazed Y        0     0.0000   0.0000      .            . 
CP 2            1     0.3766   0.3755      1.01         0.3159 
CP 25           0     0.0000   0.0000      .            . 
Grazed*CP N 2   1    -0.1546   0.4821      0.10         0.7485 
Grazed*CP N 25  0     0.0000   0.0000      .            . 
Grazed*CP Y 2   0     0.0000   0.0000      .            . 
Grazed*CP Y 25  0     0.0000   0.0000      .            . 
Scale           0     1.0000 0.0000    

The summary output for the R interactive model is:

Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.02353    0.21695   0.108    0.914
GrazedY       0.02092    0.36878   0.057    0.955
CP25         -0.22198    0.30242  -0.734    0.463
GrazedY:CP25 -0.15460    0.48211  -0.321    0.748

I cannot figure out how to remedy this situation. The only thing I can think of is that SAS an R score the models differently but am not sure how to change the code so that the output would match.

1

1 Answers

2
votes

If you want to get the same estimates as SAS you can tell R to fit contrasts in the same way.

options(contrasts = c(unordered = "contr.SAS", ordered = "contr.poly"))

Note that the models you fit are the same - they're just parameterized in different ways.

I would provide more info on how the models are functionally the same but SO is more for the programming aspect and you would be better served reading up on how to parameterize linear models if you want the details on the specifics of how and why SAS and R give apparently different results.

> lm.R <- lm(mpg ~ cyl*am, data = mtcars)
> getOption("contrasts")
        unordered           ordered 
"contr.treatment"      "contr.poly" 
> options(contrasts = c(unordered = "contr.SAS", ordered = "contr.poly"))
> lm.SAS <- lm(mpg ~ cyl*am, data = mtcars)
> lm.R

Call:
lm(formula = mpg ~ cyl * am, data = mtcars)

Coefficients:
(Intercept)         cyl6         cyl8          am1     cyl6:am1     cyl8:am1  
     22.900       -3.775       -7.850        5.175       -3.733       -4.825  

> lm.SAS

Call:
lm(formula = mpg ~ cyl * am, data = mtcars)

Coefficients:
(Intercept)         cyl4         cyl6          am0     cyl4:am0     cyl6:am0  
     15.400       12.675        5.167       -0.350       -4.825       -1.092  

So we can see that the two ways of fitting provide "different" parameters but if we ask it for predictions they end up being identical.

> predict(lm.R)
          Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
           20.56667            20.56667            28.07500            19.12500 
  Hornet Sportabout             Valiant          Duster 360           Merc 240D 
           15.05000            19.12500            15.05000            22.90000 
           Merc 230            Merc 280           Merc 280C          Merc 450SE 
           22.90000            19.12500            19.12500            15.05000 
         Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
           15.05000            15.05000            15.05000            15.05000 
  Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
           15.05000            28.07500            28.07500            28.07500 
      Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
           22.90000            15.05000            15.05000            15.05000 
   Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
           15.05000            28.07500            28.07500            28.07500 
     Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
           15.40000            20.56667            15.40000            28.07500 
> predict(lm.SAS)
          Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
           20.56667            20.56667            28.07500            19.12500 
  Hornet Sportabout             Valiant          Duster 360           Merc 240D 
           15.05000            19.12500            15.05000            22.90000 
           Merc 230            Merc 280           Merc 280C          Merc 450SE 
           22.90000            19.12500            19.12500            15.05000 
         Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
           15.05000            15.05000            15.05000            15.05000 
  Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
           15.05000            28.07500            28.07500            28.07500 
      Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
           22.90000            15.05000            15.05000            15.05000 
   Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
           15.05000            28.07500            28.07500            28.07500 
     Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
           15.40000            20.56667            15.40000            28.07500 

And yes this was an actual linear model but the theory and reasoning hold for glms as well.