I have a dataset containing an ID
variable, and a variable that has four levels. I would like to count the number unique ID values by each distinct combination of values of the second variable that occurs in the dataset.
Have:
ID Var2
--------
1 A
1 B
1 C
2 A
2 B
2 C
2 D
3 A
3 B
4 A
4 B
4 C
5 A
5 B
5 C
6 A
6 B
6 C
6 D
Want:
Var2 Unique ID
distinct freq
A 0
B 0
C 0
D 0
AB 1
AC 0
AD 0
BC 0
BD 0
CD 0
ABC 3
ABD 0
ACD 0
BCD 0
ABCD 2
OR
ID Var2
context
--------
1 ABC
2 ABCD
3 AB
4 ABC
5 ABC
6 ABCD
Each observation is a distinct combination of the two variables. Given the second variable has four levels , there are 2^4-1 combinations possible. I would like to create a table that shows me the frequencies of unique ID
per each possible combination of values for Var2
.
I have thought about making a dummy variable with 15 levels according to Var2
and ID
and running a proc freq on those 15 levels. I also thought about creating a variable with the concatenated values of Var2
by ID
.
I'd like to either create a table like the one above, or a new variable that indicates the Var2
context for each distinct ID
.