1
votes

and thank you in advance for any help you can offer as I am relatively new to SAS.

I'm attempting to read into SAS a file of the following format:

001DATA...
002DATA...
RSSDATA...
001DATA...
002DATA...
RSSDATA...
001DATA...
002DATA...
001DATA...
002DATA...
RSSDATA...

Everyone in this file has an "001" and an "002" sub-record. Some, but not all, have an "RSS" sub-record (i.e., the first three lines in the above example represent data from one subject). An issue arises from the fact that not everyone in the file has an RSS sub-record, and as a result, those that don't are simply deleted when I run the program outlined below (i.e., their 001 and 002 sub-records are removed...rows 7 and 8 in the above example). The end result is that ~1/3 of my sample gets deleted via this process. I'm using the following code (for brevity I've not included all variables, and no, they're not all named "variable":):

INPUT @1 type $CHAR3. @;
RETAIN many variables;

IF type = '001' THEN DO;
INPUT @4 variable  $CHAR8.
      @12 variable $CHAR1.
;
RETURN;
END;

ELSE IF type = '002' THEN DO;
INPUT @4 variable $CHAR35.
;
RETURN;
END;

ELSE IF type = 'RSS' THEN DO;
INPUT @4 variable $CHAR6.
      @10 variable $CHAR1.
;
OUTPUT filename;
END;
RUN;

Is there a way to prevent these deletions from occurring? Essentially what I want (in the output file) is for each row to represent one subject, and include their 001, 002, and if present, their RSS data.

Thanks again for any guidance you may be able to provide!

3

3 Answers

0
votes

The reason the records with no RSS are deleted is because you only have an OUTPUT statement in the ELSE IF type = 'RSS' statement.

As soon as you write an explicit OUTPUT statement, SAS will stop outputting anything to your output dataset unless you explicitely tell it to (with the OUTPUT statement).

So when you read a record with no RSS, it is not outputted and the read values get overridden by the values of the next record.

Maybe one easy way to avoid the problem is to take your OUTPUT statement outside of your last conditional, and just write it at the end of the program.

Note: since you are using ELSE IFs, I would suggest removing the RETURN statements.

0
votes

It looks like you should output when the next record is 001. You will need to add

END=eof

to your INFILE or SET statement for the following code to work. Your code doesn't really use the type variable, but I've added nexttype in case your full use case does.

INPUT @1 nexttype $CHAR3. @;
RETAIN many variables;
DROP nexttype ;

IF nexttype = '001' THEN DO;
IF _N_ NE 1 THEN OUTPUT filename ; %* do not output before first record fully read in ;
type = '001' ;
INPUT @4 variable  $CHAR8.
      @12 variable $CHAR1.
;
RETURN;
END:

ELSE IF nexttype = '002' THEN DO;
type = '002' ;
INPUT @4 variable $CHAR35.
;
RETURN;
END;

ELSE IF nexttype = 'RSS' THEN DO;
type = 'RSS' ;
INPUT @4 variable $CHAR6.
      @10 variable $CHAR1.
;
END;

IF eof THEN OUTPUT filename ; %* catches last record ;

RUN;
0
votes

If you always have the '001' and '002' records then your input can be simpler. First read the first two records and pre-read the type from the third. Then conditionally read the third record. No need for retain since you will read all of the records in the same pass of the data step. You might have an issue at the end of the file if the last observation does not have an "RSS" record. You can use the END= option on the INFILE statement to allow you to conditionally do the pre-read of the next record's type.

data want;
  infile 'myfile' end=eof;
  INPUT @4  variable1 $CHAR8.
        @12 variable2 $CHAR1.
      / @4  variable3 $CHAR35.
  ;
  if not eof then input check $3. @@ ;
  IF check = 'RSS' THEN 
    INPUT @4  variable4 $CHAR6.
          @10 variable5 $CHAR1.
  ;
run;