1
votes

I have a data set that I want to update with the results of a separate analysis. The analysis is done in a loop. At the end of the loop, the master data set is to be updated with the values from the analysis. However, I am having difficulty inserting the new values into the master table.

For this example, I have removed the loop. Also, rather than overwrite the master data set, I create a copy.

The master data set has two variables which are used to identify which rows need values inserted into, namely var1 and var2.

data master;
  input var1 $ var2 $;

  datalines;
  A A
  A A
  A B
  A B
  ;
run;

Since the results are part of a loop, the insertions have to be performed separately. That is, I cannot combine the results of the analysis into a single table and perform one merge. I have given them as two separate data sets here.

data first_insert;
  input var1 $ var2 $ var3 $;

  datalines;
  A A C
  ;
run;

data second_insert;
  input var1 $ var2 $ var3 $;

  datalines;
  A B D
  ;
run;

My first approach is to use a MERGE statement. However, when I do this, not all the data is written to the master table.

*****************;
** Using Merge **;
*****************;

data master_merge_copy;
  set master;
run;

data master_merge_copy;
  merge master_merge_copy
        first_insert
        ;
  by var1 var2;
run;

This merges as I expect it to, placing value C in var3 where var1 = A AND var2 = A.

Obs    var1    var2    var3
1      A       A       C
2      A       A       C
3      A       B
4      A       B

However, when I perform the second merge, only the first observation matching the merge criteria is written to. I need it to write var3 = D to all observations where var1 = A AND var2 = B.

data master_merge_copy;
  merge master_merge_copy
        second_insert
        ;
  by var1 var2;
run;

                             Obs    var1    var2    var3
                              1      A       A       C
                              2      A       A       C
                              3      A       B       D
                              4      A       B

Second, I try using an UPDATE statement.

******************;
** Using Update **;
******************;
data update_copy;
  set master;
run;

data update_copy;
  update update_copy
         first_insert
         ;
  by var1 var2;
run;

However, the presence of multiple observation within the BY group generates an error.

WARNING: The MASTER data set contains more than one observation for a BY group.
var1=A var2=A var3=  FIRST.var1=0 LAST.var1=0 FIRST.var2=0 LAST.var2=1 _ERROR_=1 _N_=2
WARNING: The MASTER data set contains more than one observation for a BY group.
var1=A var2=B var3=  FIRST.var1=0 LAST.var1=1 FIRST.var2=0 LAST.var2=1 _ERROR_=1 _N_=4

The resulting data set is not as I would expect:

                             Obs    var1    var2    var3

                              1      A       A       C
                              2      A       A
                              3      A       B
                              4      A       B

It seems like there may be a solution using PROC SQL with an INSERT and WHERE statement. However, it is unclear to me how to do this when the value to be inserted lives in a separate table. All the examples I can find declare the value to be inserted explicitly. For example,

proc sql;
   update sql.newcountries
      set population=population*1.05
         where name like 'B%';
quit;

Please advise!

3
I would avoid using insert as a term here, as you're not actually doing what most people would call inserting - you're merging, or updating. insert in IT typically refers to adding rows, not adding columns.Joe
Not sure I understand. Why would you want to INSERT anything? Aren't you making a NEW table that holds the results of your analysis?Tom
@Tom s/insert/update in terms of SAS terminology. OP has a master table and is updating that master table a bit at a time with a third variable (presumably due to some analysis that is looped over some other variables)Joe

3 Answers

2
votes

Don't INSERT into anything. Generate new records and APPEND them to the NEW result file you are generating. First make sure all_results doesn't exist. Then in your loop append the current results to it. So with your example data that would be these to steps.

 proc append base=all_results data=first_insert force;
 run;
 proc append base=all_results data=second_insert force;
 run;

Now you can create the overall results you want by merging this table with your master table.

data want ;
   merge master all_results;
   by var1 var2;
run;
0
votes

It seems like LewisC_sas was able to provide an answer to a similar question on SAS forum. The syntax of SAS SQL strikes me as bizarre, but the following appears to work.

data master;
  input var1 $ var2 $;

  datalines;
  A A
  A A
  A B
  A B
  ;
run;

data first_insert;
  input var1 $ var2 $ var3 $;

  datalines;
  A A C
  ;
run;

data second_insert;
  input var1 $ var2 $ var3 $;

  datalines;
  A B D
  ;
run;

data master_copy;
  set master;
  length var3 $ 8.;
run;

proc sql;
  update master_copy A
  set var3 = ( select var3
  from first_insert
  where A.var2 = var2)
  where var2 in (select var2
  from first_insert);
  ;
quit;

proc sql;
  update master_copy A
  set var3 = ( select var3
  from second_insert
  where A.var2 = var2)
  where var2 in (select var2
  from second_insert);
  ;
quit;

Note that if this is being implemented inside of a loop, make sure that the master_copy is copied only once!

0
votes

I don't like using SQL for this as it's fairly annoying to write. I also don't like using the data step merge techniques (which do exist for this) as they're also finicky and hard to remember the syntax for, at least for me.

Data step hash is the best in my opinion for this kind of thing.

data want;
  if 0 then set first_insert;
  if _n_=1 then do;
    declare hash f(dataset:'first_insert');
    f.defineKey('var1','var2');
    f.defineData('var3');
    f.defineDone();
  end;
  call missing(of _all_); *prevents us from getting bit by the automatic RETAIN if `var3` is not on the master dataset;
  set master;
  rc = f.find();

run;

Of course, I think the real best answer is to store up all of your modifications to the end and append them then, if you can, but sometimes that's not feasible for whatever reason.