SAS PROC GENMOD - Why does consistent syntax produce different reference categories for two different binary variables?

Question

I am running a series of bivariate log binomial regressions in PROC GENMOD, using the same outcome and one binary (1/0) predictor per model. I use the exact same syntax, swapping out only the predictor variable, and in one of the models, the regression is for predictor category 1 vs. predictor category 0, while in the other model, it does the opposite. What could be going on?

My predictor variables are:

Housing_Insecure_Dich_BL: 0 = No, 1 = Yes

PrEP_Effic_Risk_Red_binary_BL: 0 = Below 90%, 1 = 90%+

Model 1:

proc genmod data=full3 descending;
class Housing_Insecure_Dich_BL (ref=first);
model Almost_Always_Take_3m = Housing_Insecure_Dich_BL / dist=bin link=log waldci ;
estimate 'Housing_Insecure_Dich_BL' Housing_Insecure_Dich_BL 1 -1/exp;
run;

Results: Class Level Information table lists the values as "Yes No" - meaning that it is comparing Yes vs. No, i.e., 1 vs 0. The prevalence ratio makes sense given the raw percentages.

Model 2:

proc genmod data=full3 descending;
class PrEP_Effic_Risk_Red_binary_BL (ref=first);
model Almost_Always_Take_3m = PrEP_Effic_Risk_Red_binary_BL / dist=bin link=log waldci ;
estimate 'PrEP_Effic_Risk_Red_binary_BL' PrEP_Effic_Risk_Red_binary_BL 1 -1/exp;
run;

Results: Class Level Information table lists the values as "Below 90% 90%+" - meaning that it is comparing ZERO to ONE - why is it doing this, when I've specified ref=first, and the exact same syntax with a different 1-0 coded variable produces the expected reference category coding? The prevalence ratio matches what is expected for Zero vs One, but that is not what I want.

I can just change the syntax for Model 2 to say ref=last, or ref="Below 90%", but I would rather understand what is going on and be able to use uniform syntax since all my predictors are coded the same.

Can anyone help?

Joe Joe · Accepted Answer · 2021-05-14T17:41:57

Here's an example of what you're probably doing.

proc format;
  value smokef
  0 = 'Nonsmoker'
  1 = 'Smoker'
  ;
  value bpf
  0 = 'Normal BP'
  1 = 'Higher BP'
  ;
  value statusf
  0 = 'Dead'
  1 = 'Alive'
  ;
quit;

data heart;
  set sashelp.heart;
  smokeflag = (smoking ne 0);
  bpflag    = (bp_status ne 'Normal');
  statusflag= (status = 'Alive');
  format 
    smokeflag  smokef.
    bpflag     bpf.
    statusflag statusf.
  ;
run;

proc genmod data=heart;
class smokeflag;
model statusflag = smokeflag;
estimate 'Smokeflag' smokeflag 1 -1/exp;
run;


proc genmod data=heart;
class bpflag;
model statusflag = bpflag;
estimate 'Blood Pressure flag' bpflag 1 -1/exp;
run;

Notice the same issue - it compares 'Nonsmoker Smoker' (0 1) but 'Higher BP Normal BP' (1 0). That's because GENMOD's default order is order=formatted. N comes before S, but H comes before N...

The desired results could either be obtained by changing the format to include the number (so 1 Smoker 0 Nonsmoker etc.) or by using the order=internal option:

proc genmod data=heart;
class smokeflag (ref=first order=internal);
model statusflag = smokeflag;
estimate 'Smokeflag' smokeflag 1 -1/exp;
run;


proc genmod data=heart;
class bpflag (ref=first order=internal);
model statusflag = bpflag;
estimate 'Blood Pressure flag' bpflag 1 -1/exp;
run;

order=internal tells SAS to use the unformatted order.

Some procedures also support formats saved with notsorted, but in my testing that's not available on GLM (usually is available when preloadfmt is available).

SAS PROC GENMOD - Why does consistent syntax produce different reference categories for two different binary variables?

1 Answers