1
votes

I have a dataset containing an ID variable, and a variable that has four levels. I would like to count the number unique ID values by each distinct combination of values of the second variable that occurs in the dataset.

Have:

ID  Var2 
-------- 
1   A 
1   B
1   C

2   A
2   B
2   C
2   D

3   A
3   B

4   A 
4   B
4   C

5   A
5   B
5   C

6   A   
6   B
6   C 
6   D 

Want:

Var2         Unique ID
distinct     freq

A            0
B            0    
C            0
D            0
AB           1
AC           0
AD           0
BC           0
BD           0
CD           0
ABC          3     
ABD          0
ACD          0
BCD          0
ABCD         2

OR

ID  Var2
    context
-------- 
1   ABC
2   ABCD
3   AB
4   ABC
5   ABC
6   ABCD

Each observation is a distinct combination of the two variables. Given the second variable has four levels , there are 2^4-1 combinations possible. I would like to create a table that shows me the frequencies of unique ID per each possible combination of values for Var2.

I have thought about making a dummy variable with 15 levels according to Var2and ID and running a proc freq on those 15 levels. I also thought about creating a variable with the concatenated values of Var2 by ID.

I'd like to either create a table like the one above, or a new variable that indicates the Var2 context for each distinct ID.

3
Do you have duplicate rows in the data?Tom

3 Answers

1
votes

Assuming that your rows are sorted by the two variables (and unique) then you can get what you want with two PROC calls. If not then add a PROC SORT step with NODUPKEY option.

proc transpose data=have out=step1;
  by id ;
  var var2;
run;

proc summary data=step1 nway missing;
  class col: ;
  output out=want ;
run;

Result

Obs    COL1    COL2    COL3    COL4    _TYPE_    _FREQ_

 1      A       B                        15         1
 2      A       B       C                15         3
 3      A       B       C       D        15         2
0
votes

if your data is already sorted then you can do dow loop as shown below.

data want(drop=var2);
length id 8 var2_context $20;
do until(last.id);
 set have;
by id;
Var2_context = cats(var2_context, var2);
end;
run;

or You can also use first. and last. concept as shown below.

data want (drop=var2);
length id 8 var2_context $20;
 retain var2_context;
set have;
by id;
if first.id then var2_context = var2;
else Var2_context = cats(var2_context, var2);
if last.id;
run;

if your data is not sorted then do a proc sort

proc sort data = have;
by id var2;
run;
0
votes

Here are two other techniques.

The first performs an in DATA step pivot using an array and a DOW loop. The array is combined using cat function.

The second uses wall paper SQL to pivot the data into columns that are concatenated.

data have;
attrib
  id   length=8
  Var2 length=$1
;
input ID  Var2 $; datalines;
1   A 
1   B
1   C
2   A
2   B
2   C
2   D
3   A
3   B
4   A 
4   B
4   C
5   A
5   B
5   C
6   A
6   B
6   C
6   D
run;

data want(keep=id combo);
  array across(4) $1;
  do _n_ = 1 by 1 until (last.id);
    set have;
    by id;

    across(_n_) = Var2;
  end;
  length combo $4;
  combo = cat(of across(*));
run;

proc freq data=want;
  table combo;
run;

proc sql;
  create table want as
  select distinct
    have.id,
    A.Var2 || B.Var2 || c.Var2 || D.var2
    as combo
  from 
    have
    left join (select id, Var2 from have where Var2 = 'A') A on have.id = A.id
    left join (select id, Var2 from have where Var2 = 'B') B on have.id = B.id
    left join (select id, Var2 from have where Var2 = 'C') C on have.id = C.id
    left join (select id, Var2 from have where Var2 = 'D') D on have.id = D.id
  ;