2
votes

I am relatively new to R modelling and I came across the GLM functions for modelling. I am interested in Logistic regression using the family 'binomial'. My question is when my dependent variable can take one of two possible outcomes - say 'positive', 'negative' - what is the default outcome for which the estimates are computed - does the model predict the log odds for a 'positive' or a 'negative' outcome by default ? Also, what is the default outcome considered for estimation when the dependent variable is

  1. Yes or No
  2. 1 or 2
  3. Pass or Fail

etc. ?

Is there a rule by which R selects this default? Is there a way to override it manually? Please clarify.

1

1 Answers

2
votes

It's in the details of ?binomial:

For the ‘binomial’ and ‘quasibinomial’ families the response can be specified in one of three ways:

  1. As a factor: ‘success’ is interpreted as the factor not having the first level (and hence usually of having the second level). added note: this usually means the first level alphabetically, since this is how R defines factors by default.

  2. As a numerical vector with values between ‘0’ and ‘1’, interpreted as the proportion of successful cases (with the total number of cases given by the ‘weights’).

  3. As a two-column integer matrix: the first column gives the number of successes and the second the number of failures.

So the probability predicted is the probability of "success", i.e. of the second level of the factor, or the probability of a 1 in the numeric case.

From your examples:

  • Yes or No: the default will be to treat "No" as a failure (because alphabetical), but you can use my_data$my_factor <- relevel(my_data$my_factor,"Yes") to make "Yes" be the first level.
  • 1 or 2: this will either fail or produce bogus results. Either make the variable into a factor ("1" will be treated as the first level) or subtract 1 to get a 0/1 variable (or use 2-x if you want 2 to be treated as a failure)
  • Pass or Fail: see "Yes or No" ...