1
votes

I have a table with some variables, say var1 and var2 and an identifier, and for some reasons, some identifiers have 2 observations.

I would like to know if there is a simple way to put back the second observation of the same identifier into the first one, that is

  • instead of having two observations, each with var1 var2 variables for the same identifier value
ID    var1    var2
------------------
A1    12      13
A1    43      53
  • having just one, but with something like var1 var2 var1_2 var2_2.
ID    var1    var2    var1_2    var2_2
--------------------------------------
A1    12      13      43        53

I can probably do that with renaming all my variables, then merging the table with the renamed one and dropping duplicates, but I assume there must be a simpler version.

3
I would suggest that this isn't a great idea, unless there's a very good reason to; it is almost always easier to work with data with fewer variables and more observations.Joe

3 Answers

1
votes

Actually, your suggestion of merging the values back is probably the best.

This works if you have, at most, 1 duplicate for any given ID.

data first dups;
set have;
by id;
if first.id then output first;
else output dups;
run;

proc sql noprint;
create table want as
select a.id,
       a.var1,
       a.var2,
       b.var1 as var1_2,
       b.var2 as var2_2
from first as a
  left join
     dups as b
  on a.id=b.id;
quit;
1
votes

Another method makes use of PROC TRANSPOSE and a data-step merge:

/* You can experiment by adding more data to this datalines step */
data have;
infile datalines;
input ID :  $2.   var1    var2;
datalines;
A1    12      13
A1    43      53
;
run;

/* This step puts the var1 values onto one line */
proc transpose data=tab out=new1 (drop=_NAME_) prefix=var1_;
  by id;
  var var1;
run;

/* This does the same for the var2 values */
proc transpose data=tab out=new2 (drop=_NAME_) prefix=var2_;
  by id;
  var var2;
run;

/* The two transposed datasets are then merged together to give one line */
data want;
  merge new1 new2;
    by id;
run;

As an example:

data tab;
infile datalines;
input ID :  $2.   var1    var2;
datalines;
A1    12      13
A1    43      53
A2    199     342
A2    1132    111
A2    91913   199191
B1    1212    43214
;
run;

Gives:

ID  var1_1  var1_2  var1_3  var2_1  var2_2  var2_3
---------------------------------------------------
A1  12      43      .       13      53      .
A2  199     1132    91913   342     111     199191
B1  1212    .       .       43214   .       .
0
votes

There's a very simple way of doing this, using the IDGROUP function within PROC SUMMARY.

data have;
input ID $ var1 $ var2 $;
datalines;
A1    12      13
A1    43      53
;
run;

proc summary data=have nway;
class id;
output out=want (drop=_:)
        idgroup(out[2] (var1 var2)=);
run;