1
votes

I have the following Data set:

Date      Lag2_ADS   ADS     EMP
May06     .          66.2    2
Jun06     .          55      3.3
Jul06     66.2       45.6    1.2
Aug06     55         -7.9    1.2
Sep06     45.6       -16.8   1.3

The data continues until July15

I then run the following regressions:

ODS listing;
    ODS output FitStatistics =Mydata
        proc reg data = my data;
           where Date > '01Jul2006";
        model Emp = Lag2_ADS;

run; quit;

Now, my question is when I run the program, do I need to specify the where Date > '01Jul2006" or does SAS automatically takes care of observations that are missing.

My other question is what values of EMP and Lag2_ADS does SAS start the regression with if I don't specify the Date >'01Jul2006' ?

P.S. I ran the regression with and without the Date subset and the resulting R-squares are different for both, so I want to make sure I am running the right regression.

1

1 Answers

0
votes

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_reg_sect026.htm

PROC REG constructs only one crossproducts matrix for the variables in all regressions. If any variable needed for any regression is missing, the observation is excluded from all estimates. If you include variables with missing values in the VAR statement, the corresponding observations are excluded from all analyses, even if you never include the variables in a model. PROC REG assumes that you might want to include these variables after the first RUN statement and deletes observations with missing values.

Your R-squared values are different because you're using >, not >=.

where Date > '01JUL2006'd includes Aug06, Sep06

where Date >= '01JUL2006'd includes Jul06, Aug06, Sep06

You should see identical R-squared values with or without the >= filter.