0
votes

I have an odd dataset that I need to import into SAS, splitting the records into two tables depending on formatting, and dropping some records altogether. The data is structured as follows:

c Comment line 1
c Comment line 2
t lines init
a 'mme006'   M   8   99   15   '111 ME - RANDOLPH ST'
  path=no
    dwt=0.01  42427  ttf=1  us1=3  us2=0
    dwt=#0   42350  ttf=1  us1=1.8  us2=0  lay=3
    dwt=>0  42352  ttf=1  us1=0.5  us2=18.13
    42349  lay=3
a 'mme007'   M   8   99   15   '111 ME - RANDOLPH ST'
  path=no
    dwt=+0  42367  ttf=1  us1=0.6  us2=0
    dwt=0.01  42368  ttf=1  us1=0.6  us2=35.63 lay=3
    dwt=#0  42369  ttf=1  us1=0.3  us2=0
    42381  lay=3

Only the lines beginning with a, dwt or an integer need to be kept.

For the lines beginning with a, the desired output is a table like this, called "lines", which contains the first two non-a values in the row:

 name   | type
--------+------
 mme006 | M
 mme007 | M

For the dwt/integer rows, the table "itins" would look like so:

 anode | dwt  | ttf | us1 | us2   | lay
 ------+------+-----+-----+-------+-----
 42427 | 0.01 |   1 | 3.0 |  0.00 |
 42350 | #0   |   1 | 1.8 |  0.00 |   3
 42352 | >0   |   1 | 0.5 | 18.13 | 
 42349 |      |     |     |       |   3       <-- line starting with integer
 42367 | +0   |   1 | 0.6 |  0.00 |
 42368 | 0.01 |   1 | 0.6 | 35.63 |   3
 42369 | #0   |   1 | 0.3 |  0.00 |
 42381 |      |     |     |       |   3       <-- line starting with integer

The code I have so far is almost there, but not quite:

data lines itins;
  infile in1 missover;
  input @1 first $1. @;
      if first in ('c','t') then delete;
      else if first='a' then do;
        input name $ type $;
        output lines; end;
      else do;
        input @1 path=$ dwt=$ anode ttf= us1= us2= us3= lay=;
        if path='no' then delete;
        output itins; end;

The problems:

  • The "lines" table is correct, except I can't get rid of the quotes around the "name" values (e.g. 'mme006')
  • In the "itins" table, "ttf", "us1", and "us2" are being populated correctly. However, "anode" and "lay" are always null, and "dwt" has values like #0 4236 and 0.01 42, always 8 characters long, borrowing part of what should be in "anode".

What am I doing wrong?

1
I think you're getting yourself into trouble mixing input methods here. Ridding the quotes is easy (dequote()) but the other part I don't know that you can solve this way, since anode can't be read with named input.Joe
Thanks for the dequote() tip! Any suggestions for alternative approaches that will allow anode to be imported without modifying the input data?nmpeterson

1 Answers

1
votes

DEQUOTE() will remove matched quotation marks.

Your problem with dwt is that you'll need to tell it what informat to use; so if dwt is four long, :$4. instead of just $.

However, anode is a problem. The solution I came up with is:

data lines itins;
  infile in1 missover;
  input @1 first $1. @;
      if first in ('c','t') then delete;
      else if first='a' then do;
        input name $ type $;
        output lines; end;
      else do;
        input @1 path= $ @;
        if path='no' then delete;
        else do;
            if substr(_infile_,5,1)='d' then do;
                input dwt= :$12. ttf= us1= us2= us3= lay=;
                anode=input(scan(dwt,2,' '),best.);
                dwt=scan(dwt,1,' ');
                output itins; 
            end;
            else do;
                input @5 anode 5. lay=;
                output itins;
            end;
        end;
    end;

run;

Basically, check for plan first; then if it's not a plan row, check for the 'd' in dwt. If that's present, read in a line like that, incorporating anode into dwt and then splitting it off later. If it's not present, just read in anode and lay.

If dwt can have widths other than 2-4 such that it might need to be shorter, then this probably won't work, and you'll have to explicitly figure out the position of anode to read it in properly.