I have a data set that I want to update with the results of a separate analysis. The analysis is done in a loop. At the end of the loop, the master data set is to be updated with the values from the analysis. However, I am having difficulty inserting the new values into the master table.
For this example, I have removed the loop. Also, rather than overwrite the master data set, I create a copy.
The master data set has two variables which are used to identify which rows need values inserted into, namely var1
and var2
.
data master;
input var1 $ var2 $;
datalines;
A A
A A
A B
A B
;
run;
Since the results are part of a loop, the insertions have to be performed separately. That is, I cannot combine the results of the analysis into a single table and perform one merge. I have given them as two separate data sets here.
data first_insert;
input var1 $ var2 $ var3 $;
datalines;
A A C
;
run;
data second_insert;
input var1 $ var2 $ var3 $;
datalines;
A B D
;
run;
My first approach is to use a MERGE
statement. However, when I do this, not all the data is written to the master table.
*****************;
** Using Merge **;
*****************;
data master_merge_copy;
set master;
run;
data master_merge_copy;
merge master_merge_copy
first_insert
;
by var1 var2;
run;
This merges as I expect it to, placing value C
in var3
where var1 = A AND var2 = A
.
Obs var1 var2 var3
1 A A C
2 A A C
3 A B
4 A B
However, when I perform the second merge, only the first observation matching the merge criteria is written to. I need it to write var3 = D
to all observations where var1 = A AND var2 = B
.
data master_merge_copy;
merge master_merge_copy
second_insert
;
by var1 var2;
run;
Obs var1 var2 var3
1 A A C
2 A A C
3 A B D
4 A B
Second, I try using an UPDATE
statement.
******************;
** Using Update **;
******************;
data update_copy;
set master;
run;
data update_copy;
update update_copy
first_insert
;
by var1 var2;
run;
However, the presence of multiple observation within the BY group generates an error.
WARNING: The MASTER data set contains more than one observation for a BY group.
var1=A var2=A var3= FIRST.var1=0 LAST.var1=0 FIRST.var2=0 LAST.var2=1 _ERROR_=1 _N_=2
WARNING: The MASTER data set contains more than one observation for a BY group.
var1=A var2=B var3= FIRST.var1=0 LAST.var1=1 FIRST.var2=0 LAST.var2=1 _ERROR_=1 _N_=4
The resulting data set is not as I would expect:
Obs var1 var2 var3
1 A A C
2 A A
3 A B
4 A B
It seems like there may be a solution using PROC SQL
with an INSERT
and WHERE
statement. However, it is unclear to me how to do this when the value to be inserted lives in a separate table. All the examples I can find declare the value to be inserted explicitly. For example,
proc sql;
update sql.newcountries
set population=population*1.05
where name like 'B%';
quit;
Please advise!
insert
as a term here, as you're not actually doing what most people would callinserting
- you're merging, or updating.insert
in IT typically refers to adding rows, not adding columns. – Joe