1
votes

I am looking at two procedures to import two txt files in SAS. The first file is fixed width. The second txt file is delimited file. The SAS code attached below:

  1. DATA filename;  
    INFILE "filelocation";  
    INPUT  
    VAR1 $1-11  
    VAR2 $13-16  
    @18 VAR3 MMDDYY10.   
    VAR4 $29-53;  
    
    INFORMAT VAR1 $11.;  
    INFORMAT VAR2 $4.;  
    INFORMAT VAR3 MMDDYY10.;  
    INFORMAT VAR4 $25.;  
    FORMAT VAR1 $11.;  
    FORMAT VAR2 $4.;  
    FORMAT VAR3 MMDDYY10.;  
    FORMAT VAR4 $25.;  
    ;  
    RUN;  
    
  2. DATA filename;  
    INFILE "filelocation" DELIMITER="|" MISSOVER  
    DSD LRECL=32767;  
    INFORMAT VAR1 $11.;  
    INFORMAT VAR2 $4.;  
    INFORMAT VAR3 MMDDYY10.;  
    INFORMAT VAR4 $25.;  
    FORMAT VAR1 $11.;  
    FORMAT VAR2 $4.;  
    FORMAT VAR3 MMDDYY10.;  
    FORMAT VAR4 $25.;  
    INPUT  
    VAR1 $  
    VAR2 $  
    VAR3   
    VAR4 $  
    ;  
    RUN;  
    

My questions are:
1. Why does the "INPUT" locate in the beginning of the code in the first procedure, but in the last in the second procedure? Does the order of "INPUT" matters?

  1. In the first procedure, there is a "@18" in front of VAR3, which VAR3 is a variable represents date, and it determines the VAR3 starts from the position 18th. Can all of the variables use this expression? ex. @1 VAR1 $
    @13 VAR2 $
    @18 VAR3 MMDDYY10.
    @29 VAR4 $;

  2. In the procedure2,
    INPUT VAR1 $
    VAR2 $
    VAR3
    VAR4 $
    why doesn't the variable have any number after the "$" sign to determine the length of the variable?

Thank you!

2
Thats too many questions at once. Please simplify your question to one per post.Reeza
Thanks for the suggestion. I'va narrowed it down to three questions, and they are all related. Hope this is better to understand.mumu.W

2 Answers

1
votes

The main difference you are talking about is the difference between data that is stored in FIXED column locations and data that is DELIMITED. Since your first example uses data with fixed column locations you can use column ranges (1-11) to read the data. With delimited data you cannot specify fixed columns (or even fixed lengths to read) since you do not know how many characters there are between the delimiters. Instead you must use list mode input and SAS will read the value up to the next delimiter.

Let's tackle the detailed questions.

  1. Why? Because that is how the program was written. There is a lot of flexibility in how you write SAS code.

The important thing to understand about the order of statements when building a dataset is the impact that the order might have on the result. SAS will try to determine the definition of variables you are using as soon as it can. So if you place a FORMAT statement before your INPUT statement it can impact both the type of variable that SAS creates and the order that they are created in the data step.

  1. No, they are NOT the same.

When you ask it to read VAR1 $ 1-11 you are asking it to read whatever is in columns 1 to 11, including any embedded blanks. It also knows that you want VAR1 to be defined as character (since you used the $) and it should have room for 11 bytes. When you ask it to read @1 VAR1 $ it will read the next word that it sees starting at column 1. It will stop at the first blank. So it might read column 1 to 5 or it might read column 70 to 77, if column 1 to 69 are blank. It will also make VAR1 have a length of only 8 (unless you previously defined it) since that is the default for character variables when SAS cannot tell that you want a different length.

The reason that the original program used @18 VAR3 MMDDYY10. is because you need to specify the informat to have SAS properly convert the text in the data into the number that SAS uses to represent that date and you cannot do that with a column range.

  1. You do not need the length. You do not even need the $ since you have already defined the variable type.

You have previously set the length for the variable the first time that they were referenced. So the INFORMAT statement(s) have had the side effect of setting the length of the variable in addition to the INFORMAT that should be used to convert the text being read. If you really want to define your variables you should use a LENGTH or ATTRIB statement.

0
votes
  1. Because of how SAS processes a data step, the order of statements don't always matter. I don't know in this case if it does or doesn't matter, but it's definitely unconventional. Typically, the INFORMAT/FORMAT comes before the input statement. You could run it and check fairly easily though.
  2. This is pointer control method and moves the read cursor to the specified column. The documentation is clear:

    @n moves the pointer to column n.

  3. $ specifies a character variable, the length or format of the variable isn't required and can be previously specified using an INFORMAT/FORMAT/LENGTH statements.