3
votes

I'm trying to use an array to accomplish the following. However, I've never used one before and I'm having a bit of trouble.

Basically I want to take the following data and create one row(observation) for each acct, seq1, seq2 combination:

acct  seq1 seq2  la      ln
9999    1   10   20.01   100
9999    1   10   19.05   1
9999    2   11   26.77   100
9999    2   11   24.86   1
8888    3   20   38.43   218
8888    3   20   37.53   1

This is what I want to end up with. Note, I'm only showing la1 through la3 and ln1 through ln3 for the sake of space. It would actually to go to la7 and ln7:

acct  seq1  seq2     la1     la2    la3  ln1    ln2 ln3
9999    1   10      20.01   19.05   .    100     1   .
9999    2   11      26.77   24.86   .    100     1   .
8888    3   20      38.43   37.53   .    218     1   .

Here is the code that I've attempted so far. Any assistance would be greatly appreciated:

data want;
set have; 
by acct seq1 seq2;
if first.seq2 then do;
array la_array {7} la1-la7;
array ln_array {7} ln1-ln7;
end;
do i = 1 to 7;
la_array(i)=la;
ln_array(i)=ln;
end;
run;
2
Comments on your code: Array statement is not executable, so it should not be in if statement block. What you want to do in that block is initialize things to zero. You need to retain your variables that are in the arrays, and you need to only populate la_array[seq1], not all seven each loop through the dataset.Joe

2 Answers

5
votes

I'd go down a different route and take advantage of a little used feature in PROC SUMMARY (or MEANS) to achieve this. The output table is sorted by ACCT as this is the first variable in the CLASS statement, obviously you can easily resort by SEQ1 order afterwards.

The IDGROUP statement in conjunction with OUT[7] creates 7 variables for each of the variables listed (LA, LN) and populates them in sequence as the data is read. The AUTONAME option adds sequential numbers to the variable names (e.g. LN1, LN2,...,LN7). The only problem I've come across with this method is an out of memory error on the server where the source dataset is very large, this being due to PROC SUMMARY being run in memory. Hopefully it will work for you in this instance.

proc summary data=have nway;
class acct seq1 seq2;
output out=want (drop=_:) 
        idgroup(out[7] (la ln)=) / autoname;
run;
1
votes

Ok... you have to keep in mind that SAS arrays are tipically not real vectors or arrays of memory variables like in most other languages. They are just placeholders for dataset variables.

That said, the ARRAY statement is just there to define the mapping between the dataset variables and the shorthand array reference. It is useless to put it where you placed it as it does not initialize anything.

Another thing you should consider is that you need to accumulate the values and write a single row. You cannot do that on the FIRST row as you have not read all the data yet. You have to use the LAST.

A third issue is that the new variables you are defining must preserve the value when you read another row. SAS typically resets all variables that are not in the input dataset to MISSING. To avoid this and accumulate sums, min, max or other values on a group of rows, you must use the RETAIN statement.

Last... you may want to drop any variables you have just used to do your calculations, but are not desired in the ouput dataset... "i" in this case.

I think this should get you what you want:

DATA want (DROP=i);
  SET have; 
  BY acct seq1 seq2;
  /* define variables */
  ARRAY la_array {7} la1-la7;
  ARRAY ln_array {7} ln1-ln7;
  RETAIN i la1-la7 ln1-ln7;
  IF FIRST.seq2 then do;
    /* initialize group variables */
    DO i = 1 to 7;
      la_array(i)=.;
      ln_array(i)=.;
    END;
    /* initialize index variable */
    i = 1
  END;
  /* read input row values into array variables */
  la_array(i)=la;
  ln_array(i)=ln;
  i = i + 1;
  /* write output row if group is finished */
  IF LAST.seq2;
RUN;

Unfortunately I do not have a SAS installation available at the moment... you'll have to try it and let me know how it goes.