2
votes

I am an R newbie and I have been playing around with datasets to learn R. Most of my experience has been with SAS. So, in attempting to conduct a log binomial regression on a dichotomous outcome and exposure variable, I immediately noticed the result produced by R did not correspond with what I got doing a contingency analysis, that is, producing a crude relative risk estimate, AND from SAS results.

The dataset has 400 observations. The outcome is acceptance to college (1=Yes, 0=No) and the independent variable is high school class rank (1=high, 0=low).

I created a 2x2 table :

      Admission     Row Total
Rank   1      0
   1  87    125     212
   0  40    148     188

Here one can see that high rank increases the probability of being admitted to college by a factor of 1.9 [(87/212)/(40/188)]. The crude estimate would produce a beta coefficient of approximately 0.65 (ln 1.9). Yet when I ran a log binomial regression in R, the beta coefficient it yielded was 0.289.

Here's my code:

glm(formula = admit ~ rank, family = binomial(link = log), data = my data)

I know that in R I have to convert numerical variables into "factors" and order them. The reference group for both variables is 0.

In SAS the code I used was:

proc genmod data=temp; model admit=rank/link=log dist=binomial;
estimate 'Prob of admission by rank' rank 1/exp;
run;  

The beta for rank is 0.657 (RR=1.93). Am I missing something? I know this seems like a basic question, but I cannot find my mistake.

2

2 Answers

1
votes

Making your referent group 1 instead of 0 seems to fix it

# change the reference level:
x$rank <-  relevel(factor(x$rank),"1")
x$admit <- relevel(factor(x$admit),"1")

fit <- glm(admit ~ rank, data=x, family=binomial(link="log"))
coef(fit)
#(Intercept)       rank0 
# -1.5475625   0.6568844 
exp(coef(fit))
#(Intercept)       rank0 
#   0.212766    1.928774 

Whether this is a 'good thing' to do or not is somewhat questionable - read more here:

http://r.789695.n4.nabble.com/Relative-Risk-in-logistic-regression-td4657040.html

1
votes

(Your numbers are wrong: the odds ratio for admission based on rank is (87/125)/(40/148) = 2.5752, and the log-odds, which is the logistic regression coefficient, is 0.946.)

By default, R chooses the first level of a factor as the reference level. SAS however chooses the last level. There is a contr.SAS function specifically to make it easier to replicate SAS results. You can also use relevel as @thelatemail says.

> df <- data.frame(rank=factor(0:1), admit=c(40, 87), nonadmit=c(148, 125))
> contrasts(df$rank) <- contr.SAS(2)
> glm(cbind(admit, nonadmit) ~ rank, family=binomial, data=df)

Call:  glm(formula = cbind(admit, nonadmit) ~ rank, family = binomial, 
    data = df)

Coefficients:
(Intercept)        rank1  
    -0.3624      -0.9459  

Degrees of Freedom: 1 Total (i.e. Null);  0 Residual
Null Deviance:      18.31 
Residual Deviance: 2.043e-14    AIC: 15.07