1
votes

I am essentially trying to read messy data into SAS using informats and having problems. I have column of data of the following form in a raw txt file, say:

RegDate
0
0
16/10/2002
20/11/2003
0

For RegDate, 0 = missing, otherwise the date is present. I would like to read this data into SAS, giving 'NA' for the zeros and the date for the date, and output into a dataset.

If all dates were present, I could use the code

data test;
      %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
      infile "&pathlocation" delimiter='09'x
        MISSOVER DSD firstobs=2 ;
        informat RegDate ddmmyy10. ;
        format RegDate ddmmyy10. ;
        input
         RegDate;
      if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro     variable */
run;

However I cannot read the above text file doing this as it does not take into account the zeros, as the informat is set to read in dates.

If using a proc import statement

proc import datafile="&pathlocation" out=test dbms=tab replace;
run;

it tries to use a best32. informat, as there is a zero in the first row. The dates cannot then be read in.

So I need to create a custom format of some sort. I can do this for a numeric informat alone or a character informat alone, or a picture informat (which is needed for the dates?). I cannot figure out how to combine multiple formats for one variable. I'm sure the solution is very simple however I cannot find it online so I apologise if this is obvious. Is there either a way to a) put some IF-THEN statement into the format so that it does different things depending on the input b) read the data in purely as text so that the formats need to be used.

2

2 Answers

3
votes

NA's are text and not valid in SAS - they're used in R. To indicate that the value is missing for a numeric variable SAS uses a period (.). Reading the data in with your code assigns the 0 to missing which would be an appropriate read of the data.

If you want NA you'll need to read or convert the data to text, but then your dates will be text and you'll be limited in what you can do with them, for example no date calculations.

If you really want you could display it that way using a nested format.

proc format;
value na_date_fmt
low-high = [ddmmyy10.]
. = "NA";
run;


data have;
infile cards dsd;
informat regDate ddmmyy10.;
format regDate ddmmyy10.;
format newDate na_date_fmt.;
input regdate;
newDate=regdate;
cards;
0
0
16/10/2002
20/11/2003
0
;
run;

proc print data=have;
run;
0
votes

You can add an IF statement to the DATA step, like this:

data test;
  infile "&pathlocation" delimiter='09'x
    MISSOVER DSD firstobs=2 ;
    informat RegDate ddmmyy10. ;
    format RegDate ddmmyy10. ;
    input 
       RegDate;
    if RegDate = 0 then RegDate = .;
run;

The output is

   RegDate
         .
         .
16/10/2012
20/11/2003
         .