I need to merge two data sets. Each data set contains a sequential observation number. The first data set contains only the first observation. The second data set contains all subsequent observations. Not all subjects have the same number of observations.
The problem is as follows. There are two different types of subject. The type is contained only in the first data set. When I merge the two data sets together, the type is missing on all observations but the first for each subject. Please see my example below.
I would like to know how to do this with both SQL and a DATA step. My real data sets are not large, so efficiency of processing is not major a concern.
I have tried using RETAIN
, but as the second data set doesn't contain the TYPE
variable, there is no value to retain. Regarding SQL, it seems like UNION
should work, and there are countless examples of UNION
on the internet, but they all involve a single variable. I need to know how to union the Observation
variable by ID
while retaining the Amount
and assigning the Type
.
Example
data set1;
input ID $
Observation
Type $
Amount
;
datalines;
002 1 A 15
026 1 A 30
031 1 B 7
028 1 B 10
036 1 A 22
;
run;
data set2;
input ID $
Observation
Amount
;
datalines;
002 2 11
002 3 35
002 4 13
002 5 12
026 2 21
026 3 12
026 4 40
031 2 11
028 2 27
036 2 10
036 3 15
036 4 16
036 5 12
036 6 20
;
run;
proc sort data = set1;
by ID
Observation
;
run;
proc sort data = set2;
by ID
Observation
;
run;
data merged;
merge set1
set2
;
by ID
Observation
;
run;
This gives
ID Observation Type Amount
002 1 A 15
002 2 11
002 3 35
002 4 13
002 5 12
026 1 A 30
026 2 21
026 3 12
026 4 40
028 1 B 10
028 2 27
031 1 B 7
031 2 11
036 1 A 22
036 2 10
036 3 15
036 4 16
036 5 12
036 6 20
However, what I need is
ID Observation Type Amount
002 1 A 15
002 2 A 11
002 3 A 35
002 4 A 13
002 5 A 12
026 1 A 30
026 2 A 21
026 3 A 12
026 4 A 40
028 1 B 10
028 2 B 27
031 1 B 7
031 2 B 11
036 1 A 22
036 2 A 10
036 3 A 15
036 4 A 16
036 5 A 12
036 6 A 20