1
votes

I have a dataset with visitors and weather variables. I'm trying to forecast visitors based on the weather variables. Since the dataset only consists of visitors in season there is missing values and gaps for every year. When running proc reg in sas it's all okay but the issue comes when i'm using proc VARMAX. I cannot run the regression due to missing values. How can i tackle this?

proc varmax data=tivoli4 printall plots=forecast(all);
id obs interval=day;
model lvisitors = rain sunshine averagetemp
dfebruary dmarch dmay djune djuly daugust doctober dnovember ddecember
dwednesday dthursday dfriday dsaturday dsunday
d_24Dec2016 d_05Dec2013 d_24Dec2017 d_24Dec2014 d_24Dec2015 d_24Dec2019 
d_24Dec2018 d_24Sep2012 d_06Jul2015
d_08feb2019 d_16oct2014 d_15oct2019 d_20oct2016 d_15oct2015 d_22sep2017 d_08jul2015
d_20Sep2019 d_08jul2016 d_16oct2013 d_01aug2012 d_18oct2012 d_23dec2012 d_30nov2013 d_20sep2014 d_17oct2012 d_17jun2014
dFrock2012 dFrock2013 dFrock2014 dFrock2015 dFrock2016 dFrock2017 dFrock2018 dFrock2019
dYear2015 dYear2016 dYear2017
/p=7 q=2 Method=ml dftest;
garch p=1 q=1 form=ccc OUTHT=CONDITIONAL;
restrict
ar(3,1,1)=0, ar(4,1,1)=0, ar(5,1,1)=0,
XL(0,1,13)=0, XL(0,1,14)=0, XL(0,1,13)=0, XL(0,1,27)=0, XL(0,1,38)=0, XL(0,1,42)=0;
output lead=10 out=forecast;

run;

1

1 Answers

1
votes

As with any forecast, you will first need to prepare your time-series. You should first run through your data through PROC TIMESERIES to fill-in or impute missing values. The impute choice that is most appropriate is dependent on your variables. The below code will:

  • Sum lvisitors by day and set missing values to 0
  • Set missing values of averagetemp to average
  • Set missing values of rain, sunshine, and your variables starting with d to 0 (assuming these are indicators)

Code:

proc timeseries data=have out=want;

    id obs interval   = day
           setmissing = 0
           notsorted
    ;

    var lvisitors / accumulate=total;
    crossvar averagetemp / accumulate=none setmissing=average;
    crossvar rain sunshine d: / accumulate=none;
run;

Important Time Interval Consideration

Depending on your data, this could bias your error rate and estimates since you always know no one will be around in the off-season. If you have many missing values for off-season data, you will want to remove those rows.

Since PROC VARMAX does not support custom time intervals, you can instead create a simple time identifier. You can alternatively turn this into a format for proc format and converttime_id at the end.

data want;
    set have;

    time_id+1;
run;

proc varmax data=want;
    id time_id interval=day;
    ...
    output lead=10 out=myforecast;
run;

data myforecast;
    merge myforecast
          want(keep=time_id date)
    ;

    by time_id;
run;

Or, if you made a format:

data myforecast;
    set myforecast;

    date = put(time_id, timeid.);
    drop time_id;
run;