1
votes

Say you have three separate data sets consisting of the same number of observations. Each observation has an ID letter, A-Z, followed by some numerical observation. For example:

Data set 1:

B   3   8   1   9   4
C   4   1   9   3   1
A   4   4   5   4   9

Data set 2:

C   3   1   9   4   0
A   4   1   2   0   0
B   0   3   3   1   8

I want to merge the data sets BY that first variable. The problem is, the first variable is NOT already sorted in alphabetical form, and I do not want to sort it in alphabetical form. I want to merge the data but keep the original order. For example, I would get:

Merged data:

B   3   8   1   9   4
B   0   3   3   1   8
C   4   1   9   3   1
C   3   1   9   4   0
A   4   4   5   4   9
A   4   1   2   0   0

Is there any way to do this?

2

2 Answers

2
votes

You can create a variable that holds the order and then apply that the new dataset after its "merged". I believe this is an append rather than merge though. I've used a format, though you could use a sql or data set merge as well.

data have1;
input id $ var1-var5;
cards;
B   3   8   1   9   4
C   4   1   9   3   1
A   4   4   5   4   9
;
run;


data have2;
input id $ var1-var5;
cards;
C   3   1   9   4   0
A   4   1   2   0   0
B   0   3   3   1   8
;
run;

data order;
set have1;
fmtname='sort_order';
type='J';
label=_n_;
start=id;
keep id fmtname type label start;
run;

proc format cntlin=order;
run;


data want;
set have1 have2;
order_var=input(id, $sort_order.);
run;

proc sort data=want;
by order_var;
run;
0
votes

This is just one SQL version which follows along a similar path to Joe's answer. Row order is input via a sub-query rather than a format. However the initial order of the two input tables is lost in the join to the row order sub-query. The original order (have2 follows have1) is re-instated by using the table names as a secondary order variable.

proc sql;
    create table want1 as
    select want.id
        ,want.var1
        ,want.var2
        ,want.var3
        ,want.var4
        ,want.var5

    from    (   
            select  *
                , 'have1' as source
            from    have1
            union   all
            select  *
                , 'have2' as source
            from    have2
        )   as  want

    left    join
        (   
            select  id
                , monotonic() as row_no
            from    have1
        )   as  order
    on  want.id eq  order.id

    order by order.row_no
        ,want.source
    ;
quit;

proc compare 
    base=want1
    compare=want
    ;
run;

And this is a data step version without a format. Here the have1 table with row order is re-merged with the concatenated data (have1 and have2) and then re-sorted by row order.

data want2;
    set have1 have2;
run;


data have1;
    set have1;
    order_var = _n_;
run;

proc sort   data=want2;
    by id;
run;


proc sort   data=have1;
    by id;
run;

data want2;
    merge want2 have1;
    by id;
run;


proc sort data=want2;
by order_var;
run;


proc compare 
    base=want2
    compare=want
    ;
run;