0
votes

I'm curious about how SAS handles informats and input statements with informats. What's the "order of operations" of these statements? I included an example snippet from a program that SAS EG Import Wizard generated.

Disclaimer: I rarely use EG Import Wizard, but my employer has asked that we use EG when possible, i.e. creating new programs, so I was curious how this functionality worked.

Data:
TimeStamp 01/01/2019 12:00:00 AM

Example EG Generated Code:

data Input;
length TimeStamp 4;
format TimeStamp mmddyy10.;
informat TimeStamp mmddyy10.;
...some infile statement...
input TimeStamp : Best32;
TimeStamp = DatePart(TimeStamp);
run;

The above example is the code EG generated, but I'm curious as to why all these statements were generated. I'm also unsure of why SAS used the : Best32 informat with the input statement when my Import Wizard states DateTime18.

Historically, using BASE SAS, I've just used:

  1. Informat with an input statement
  2. An informat statement and then a following input statement. The input statement would then have only contained the variable name.

Example of #1:

Data Test;
...infile...;
input @1 TimeStamp DateTime18.;
...format...;
run;

Example of #2:

Data Test2;
...infile...;
informat TimeStamp DateTime18.;
input TimeStamp;
...format...;
run;

Is Example #1 just shorthand of Example #2? If so, why is EG generating the extra steps? In the EG Generated Code - how is the informat statement not overriding the input statements informat

1
I think you might have a better chance of receiving an answer if you ask the question at communities.sas.com since technical people working at SAS contribute often to that forum. - mastropi
Are you asking why the Enterprise Guide wizard did what it did? Or just to understand the example code that it generated? - Tom
Can you double check that your example EG generated code is really what EG generated? Does it work? In particular, what value do you get for TimeStamp? I don't see how it could be doing the right thing. - Quentin
@Tom good clarification. Technically I'm asking both - why did it do what it did, because I'm not sure I understand why EG did it. - DukeLuke
@Quentin yes I am sure, I copy/pasted and changed the var names, but kept the statements in the same order. - DukeLuke

1 Answers

2
votes

The INFORMAT and FORMAT statement are not executable. So you can place them anywhere in the data step (excluding the side effect of forcing a type to be defined for a variable that the compiler hasn't typed yet). Note this also means that if you assign multiple FORMATs (informats) to the same variable the last one will be what is used.

When the INPUT statement executes any explicit informat specification you have included in the INPUT statement itself will override any informat associated with the variable. Note again that if the variable has not already been typed by the compiler then how the INPUT statement uses the variable will cause a type to be selected for the variable.

So for the most predictable results you should define your variables instead of letting SAS guess based how they first appear. You can define them using the LENGTH statement or ATTRIB statement. Or define them by pulling in an existing dataset with SET,MERGE and other statements. Then the order of the INPUT, FORMAT and INFORMAT statements will not matter.

You would have to ask SAS why the Enterprise Guide Wizard works the way it works. My understanding is that for some files (like Excel spreadsheets) it will convert the data into a text file and upload the text file it generated. So I assume that EG generated the DATE and TIME values as the raw number of days or number of seconds and that is why it reads the value using the normal numeric informat instead of a date or time informat. I assume it attaches an INFORMAT to the date and time variables so that the metadata in the dataset definition are populated with something that matches the format that is attached.

As to why the they used the BEST32. informat I have no idea. There is not really a BEST informat in SAS so that is really just an alias for 32. (or they could have used F32.). The concept of "best" for an informat doesn't even really make sense. The BEST format is used to figure out for this particular number what is the best combination of digits to generate to approximate the value in a limited number of characters. For reading a string of characters into a number SAS just needs to read the digits and convert it to the number they represent. There is no selection of any "best" alternatives involved.