0
votes

I need to merge two data sets. Each data set contains a sequential observation number. The first data set contains only the first observation. The second data set contains all subsequent observations. Not all subjects have the same number of observations.

The problem is as follows. There are two different types of subject. The type is contained only in the first data set. When I merge the two data sets together, the type is missing on all observations but the first for each subject. Please see my example below.

I would like to know how to do this with both SQL and a DATA step. My real data sets are not large, so efficiency of processing is not major a concern.

I have tried using RETAIN, but as the second data set doesn't contain the TYPE variable, there is no value to retain. Regarding SQL, it seems like UNION should work, and there are countless examples of UNION on the internet, but they all involve a single variable. I need to know how to union the Observation variable by ID while retaining the Amount and assigning the Type.


Example

data set1;
  input ID $ 
        Observation 
        Type $ 
        Amount 
        ;

  datalines;
  002 1 A 15
  026 1 A 30
  031 1 B 7
  028 1 B 10
  036 1 A 22
  ;
run;

data set2;
  input ID $ 
        Observation 
        Amount 
        ;

  datalines;
  002 2 11
  002 3 35
  002 4 13
  002 5 12
  026 2 21
  026 3 12
  026 4 40
  031 2 11
  028 2 27
  036 2 10
  036 3 15
  036 4 16
  036 5 12
  036 6 20
  ;
run;

proc sort data = set1;
  by  ID
      Observation
      ;
run;

proc sort data = set2;
  by  ID
      Observation
      ; 
run;

data merged;
  merge set1
        set2
        ;
  by  ID
      Observation
      ;
run;

This gives

                      ID     Observation    Type    Amount

                      002         1          A        15
                      002         2                   11
                      002         3                   35
                      002         4                   13
                      002         5                   12
                      026         1          A        30
                      026         2                   21
                      026         3                   12
                      026         4                   40
                      028         1          B        10
                      028         2                   27
                      031         1          B         7
                      031         2                   11
                      036         1          A        22
                      036         2                   10
                      036         3                   15
                      036         4                   16
                      036         5                   12
                      036         6                   20

However, what I need is

                      ID     Observation    Type    Amount

                      002         1          A        15
                      002         2          A        11
                      002         3          A        35
                      002         4          A        13
                      002         5          A        12
                      026         1          A        30
                      026         2          A        21
                      026         3          A        12
                      026         4          A        40
                      028         1          B        10
                      028         2          B        27
                      031         1          B         7
                      031         2          B        11
                      036         1          A        22
                      036         2          A        10
                      036         3          A        15
                      036         4          A        16
                      036         5          A        12
                      036         6          A        20
3

3 Answers

0
votes

I'm sure there are other ways to do it, but this is how I'd do it.

First, stack the data keeping only the common fields.

data new;
set set1 (drop = TYPE) set2;
run;

Then merge the type field back over.

proc sql;
create table new2 as select
a.*,
b.TYPE
from new a
left join set1 b
on a.id=b.id;
quit;
0
votes

Proc SQL:

proc sql;
   create table want as
   select coalesce(a.id,b.id) as id,observation,type,amount from (select * from set1(drop=type) union 
   select * from set2) a left join set1 (keep=id type) b
on a.id=b.id;
quit; 
0
votes

The DATA step method is straight forward, just use SET with BY to interleave the records. You need to create a NEW variable to retain the values. If you want you can drop the old one and rename the new one to have its name.

data want ;
  set set1 set2 ;
  by id ;
  if first.id then new_type=type;
  retain new_type;
run;

For SQL use the method that @JJFord3 posted to first union the common fields and then merge on the TYPE flag. You can combine into a single statement.

proc sql;
  create table want as 
    select a.*,b.type
    from 
      (select id,observation,amount from set1
       union
       select id,observation,amount from set2
      ) a
    left join set1 b
    on a.id = b.id
    order by 1,2
  ;
quit;