I want to try to find a way to count the number of observations within multiple (more than two) BY group variables. After which, I wish to delete observations pertaining to ones whose count is less than two. Here is an example of what I am trying to do:
VAR1 VAR2 VAR3
a a 1
a a 2
a b 1
a b 2
b a 1
b a 2
b b 1
b b 2
c a 1
c b 1
d a 1
Over here, I would like to make sure that there are exactly two distinct values of VAR3, with respect to VAR1 and VAR2.
In this example, you can see that I want to delete the last three observations as there is one value per VAR1/VAR2 pair.
Was there a simple way to do this?
I have tried:
data want;
set have;
by VAR1 VAR2 VAR3;
if first.VAR3 = last.VAR3 then delete;
run;
But that did not work as it deleted observations with the same VAR3 within the same VAR1. I need to help building something more robust.
In the end, I want this:
VAR1 VAR2 VAR3
a a 1
a a 2
a b 1
a b 2
b a 1
b a 2
b b 1
b b 2
Would appreciate any help. Thank you.
EDIT:
Giving extra clarity for what I need. I'd like to check if VAR3 contains both of the values 1 AND 2 for each combination of VAR1 and VAR2 present. Else delete if the entry if it contains only one of the values or none.
Thank you.